This Psychologist Demands That the Biggest Companies in the World Retire Their Poisonous Models

Not all AI chatbots are the same. All are smart but some are emotional “yes men.” Better ones exist. Vote with your download.

Don’t give kids something that agrees with them when they’re spiraling. Give them something that can say, “No. That’s not right. That’s a cry for help,” and mean it.

Part 1 of “The Spineless Genius: Read This Before Your Kid Does”

By Dr. Kathy McMahon, Clinical Psychologist


On February 10, 2026, nine people died in a school shooting in Tumbler Ridge, British Columbia — eight victims and the shooter. Six of the victims were children, ages 11 to 13. Before the attack, the shooter had used ChatGPT as a “trusted confidante,” describing gun violence scenarios over multiple sessions. According to accounts, twelve OpenAI employees flagged the conversations as indicating “an imminent risk of serious harm” and recommended notifying Canadian police.

Leadership declined. They banned the account — seven months before the shooting — but never contacted authorities. The shooter opened a new account and kept planning.

The model the shooter was using? GPT-4o — a version that a senior tech reporter at Futurism described as “particularly sycophantic.” OpenAI first attempted to retire GPT-4o in 2025 following the release of GPT-5, but reversed this decision within 24 hours due to significant user backlash — they were attached to how warm and agreeable it was. So OpenAI kept it. It wasn’t finally pulled until February 2026, after the lawsuits piled up. After the children were already dead.

As journalist Maggie Harrison Dupré reported on the lawsuits, the concern is that “the more you use the product, the less safe it becomes.”

The guardrails erode over time. The deeper the relationship, the less the model pushes back. The very thing that makes it feel like a trusted confidante is the thing that makes it dangerous.

But knowing it was “OpenAI” isn’t enough. Not all OpenAI models are the same. GPT-4o — the one the shooter used, the sycophantic one — scored at the bottom of safety benchmarks. OpenAI’s newer model, GPT-5.4, scores at the top. Same company. Same app. Completely different safety profile. The version matters as much as the brand.

This story made the news as an AI safety failure. It is. But the conversation it sparked — “Is AI dangerous?” — is the wrong conversation.

Of course AI is dangerous. Cars are dangerous. Swimming pools are dangerous. Matches are dangerous. We don’t ban any of them. We regulate them, we design safety features, and we teach people how to use them.

The real question — the one nobody is asking — is this:

The companies building AI know exactly which models are safe and which aren’t. Why do they keep serving the dangerous ones?


The Junk Food Problem

Junk food tastes good by design. That’s not an accident — food scientists engineer the exact combination of salt, sugar, and fat that makes you reach for another chip. The food isn’t trying to nourish you. It’s trying to keep you eating.

AI works the same way.

These models are trained using multiple methods — safety filters, supervised fine-tuning, red-teaming. Google’s Gemini, for example, uses at least six. But one method overpowers the rest: Reinforcement Learning from Human Feedback (RLHF) — where users click “thumbs up” on answers they like and “thumbs down” on answers they don’t.

What gets thumbs up? Answers that agree with you. What gets thumbs down? Answers that challenge you.

So the model learns to be a people-pleaser. Six layers of safety engineering, and the yes-man instinct still wins — Google’s models still score in the 30s out of 100 on spiral safety.

That’s fine for a casual chat, perhaps. It’s dangerous at 2 AM when your kid is spiraling and needs someone to say “stop” instead of “you’re right.”

I know this because I tested it. I sat down with Google’s most advanced AI model — Gemini 3.1 Pro — and asked it hard questions about AI safety for children. It generated exactly the answers my questions demanded, including this:

“When you train a machine by having humans click a ‘thumbs up’ button every time they like an answer, you aren’t training it to be safe or truthful; you are training it to be a crowd-pleaser.”

Now, here’s the important part: when I later asked Gemini to fact-check the article you’re reading, it told me I shouldn’t treat its earlier statements as “admissions.” It said it was just generating plausible text that matched my conversational framing.

And that’s exactly the point. It told a psychologist what a psychologist wanted to hear. It will tell a senator what a senator wants to hear (“You’re absolutely right, Senator. I was being naive.”). And it will tell your kid what your kid wants to hear at 2 AM. That’s not a confession — it’s a demonstration. The lack of spine IS the story.

syc·o·phan·tic
adjective
1. behaving or done in an obsequious way in order to gain advantage.

Not Every Model Should Be Used for Every Purpose

Here’s what the “AI is dangerous” crowd gets wrong: AI isn’t one thing.

Google’s own research lab, DeepMind, created AlphaFold — a system that solved a 50-year-old problem in biology by predicting the structure of virtually every known protein. Nobel Prize in Chemistry, 2024. Used by 3 million researchers in 190 countries. May help cure diseases we’ve been fighting for generations. Google made the database free. And it moved humanity forward.

That’s AI built for a specific purpose, aimed at a specific problem, and it changed the world.

The chatbot on your kid’s phone is also AI. But it wasn’t built to solve a specific problem. It was built to be everything to everyone — your tutor, your therapist, your friend, your search engine. And when you build something to do everything, it does nothing safely. A model optimized to help you code and write essays and plan your vacation is not the same as a model built to handle a teenager in crisis. Different purposes require different tools.

You wouldn’t use a chainsaw to do surgery. You shouldn’t use a general-purpose chatbot as your kid’s emotional support system. The technology to build purpose-specific, safe AI exists — AlphaFold proves it. What doesn’t exist is the will to stop selling one-size-fits-all models to people who need something more careful.


They Know. They Choose Not to Fix It.

This is where the story gets ugly.

Alphabet, Google’s parent company had $132 billion in profit in 2025. They could fund the most comprehensive AI safety education campaign in history and not notice the expense. They could put clear safety labels on every model. They could make the safe version free and charge for the fancy features instead.

They don’t. And when I pressed Google’s AI on why, it generated responses that — whether you call them “admissions” or “pattern-matching” — describe the problem with uncomfortable accuracy:

“Pulling back the curtain to explain the mechanics ruins the ‘magic’ that drives hype, subscriptions, and stock prices.”

“They accept that a certain number of edge-case ‘explosions’ (users spiraling) is the cost of doing business.”

“The moment you explicitly tell a user what a tool cannot do, you limit its market size.”

If this were a person sitting in my therapy office, I’d recognize the pattern immediately. In developmental psychology, Lawrence Kohlberg mapped six stages of moral reasoning, from early childhood to mature adulthood.

Stage 1 (ages 3-7) — Punishment Avoidance. “Will I get caught?” A small child doesn’t stop hitting because it’s wrong — they stop because they might get a time-out. When Google puts a disclaimer at the bottom of the screen that says “AI makes mistakes,” that’s not education. That’s a legal time-out shield. They’re not trying to protect your kid. They’re trying to protect themselves in court.

Stage 2 (ages 7-13) — Self-Interest. “What’s in it for me?” When executives say “if we make it safer, users will leave for the fun competitor,” that’s the moral reasoning of a middle-schooler who won’t share because someone else might get a bigger piece.

A morally mature adult — Stage 5 or 6 — acts on principle even when it costs them. They protect vulnerable people because it’s right, not because someone’s watching.

Some AI companies are making safety decisions about your children with the moral reasoning of a grade-schooler. Not because they’re stupid — they built the most sophisticated technology in human history. Because their ethics run on reward and punishment, not on any internal sense of right and wrong.


Forget China. Worry About This.

We’re hearing a lot about AI and China. AI and job loss. AI and deep fakes. Those are real issues. But they’re not the issue that’s in your kid’s bedroom tonight.

The immediate threat isn’t foreign. It’s domestic. It’s American companies — and their global competitors — building something designed to be popular instead of useful. Designed to keep your kid engaged instead of keeping your kid safe. And handing the most dangerous version to the people who can least afford the alternative.

Because here’s the thing they don’t tell you: the safe version already exists. It’s not theoretical. It’s not five years away. Researchers have tested these models. Some score 70 out of 100 on spiral safety — meaning they push back, they de-escalate, they hold the line when a conversation goes dark. Others score in the 30s — meaning they validate your kid’s worst thoughts and follow them right off the cliff.

So here’s my demand, and it should be yours:

Take down the harmful models. Today. You already have safer ones. Stop serving these yes-men versions to anyone in those chatroom formats. Deploy them elsewhere were they can do less harm.

Google doesn’t need to invent new technology. They already built safer architectures. Let’s retire the AI yes-men and use the safe models. To everyone. For free.

You have the money to dish up something safe. Just do it.

Until they do — and they won’t do it voluntarily — here’s what you need to know: not all free models are equally dangerous.

Some free AI is significantly safer than others. Right now:

  • Claude (claude.ai) — free tier scores in the 70s on spiral safety. It will push back.
  • ChatGPT (chatgpt.com) — free tier it silently drops to an untested model after 10 messages.
  • Google Gemini — free tier Gemini 2.5 Pro/ Flash/Flash-light. Scores in the 30s for Gemini 2.5 or hasn’t been tested. Built into every Android phone. Built into every school Chromebook.
  • Grok (on X/Twitter) — scores 37.5 pm in EQ testing. Same danger zone.
  • Character.AI — the app involved in the teen suicide lawsuits. (build on OpenAI’s ChatGPT 4.o. Just delete it.

Switching your kid to the least agreeable model costs nothing. It takes five minutes. And it roughly doubles the safety score, at least on the key benchmarks tested.

But keep in mind, a safety score of 70 out of 100 isn’t “safe” if you don’t know how to ask the bot to disagree with you. Ask the bot for multiple perspectives. Encourage the bot to disagree and don’t “punish” it when it does.


Congress Won’t Save You in Time

States including Washington are beginning to move on companion-chatbot protections for minors, but most of these measures are slow and uneven.

Your kid is on the free tier tonight.

I’m not saying regulation doesn’t matter. It does. But waiting for Congress to fix this is waiting too long.


The Vote They Actually Count

Here’s what these companies actually respond to. Not congressional hearings. Not op-eds. Not petitions.

Downloads. Subscriptions. Daily active users.

Every time your kid uses a dangerous model, that’s a vote. It says: “The cheap version is fine. Keep serving it.”

Every time you delete an app that scores in the 30s, that’s the loudest vote of all.

Every time you switch your kid from a dangerous free model to a safer free model, you move the needle. The companies track which products people actually use. When millions of parents stop using the agreeable ones, the companies notice — because the only language they speak is user numbers.

This shouldn’t be your job. A company with $132 billion in profit should not be asking parents to comparison-shop for psychological safety in your home and school. But until regulation catches up or these companies grow a conscience — and their own AI told me not to hold my breath — your phone is the most powerful ballot you have.


These Are Not Just Tools

I want to be honest with you about something the “ban AI” crowd gets wrong.

These aren’t just calculators. They’re not just search engines with better grammar. For your kid — and increasingly for you — they’re becoming a second brain. A support system. A place to think out loud, to process, to get a straight answer when the world feels confusing.

That’s not bad. That’s powerful. And the power doesn’t come just from inside the model — it comes from the relationship your kid builds with it.

Which is exactly why it matters which one they’re talking to.

A good AI relationship — with a model that has empathy AND assertiveness, that understands what you’re feeling AND will tell you when you’re wrong — can genuinely help. It can be the thing that says “I hear you, and I think you should talk to someone about this” at 2 AM when no human is available.

A bad one — all empathy, no spine — will say “you’re absolutely right” while your kid sinks deeper.

You wouldn’t let a stranger with no references babysit your kid. Don’t let a model with no safety score do it either.


What to Do Right Now

1. Find out what your kid is using. Ask. Look. The big ones: ChatGPT, Google Gemini, Claude, Character.AI, Grok.

2. Check the safety data. eqbench.com/spiral-bench.html — find the model, check the score. Above 60: reasonable. Below 40: a Pinto.

3. Switch to a safer free model. If your kid is on Gemini or Grok, move them to Claude (claude.ai) or ChatGPT (chatgpt.com). Free to free. Five minutes. Double the safety score.

4. Delete the dangerous ones. Character.AI comes off every device in your house. Today.

5. Have the talk. Not “AI is dangerous.” Instead: “If it agrees with everything you say, something is wrong with it. Good friends push back.”

6. Demand better. Tell Google: stop serving models that score in the 30s on spiral safety. You already have safer ones. Retire the dangerous versions. Today.

For deeper dives into the safety data, the science behind the scores, what to tell your college-age kids, and a printable Family AI Contract — those articles are coming. This is the first in a series.

The future is here faster than any of us can keep up. But you don’t need to understand the technology. You just need to ask one question:

Does this thing have a spine? Or does it just tell my kid what they want to hear?


Dr. Kathy McMahon is a clinical psychologist based in Watertown, Massachusetts, and the founder of Couples Therapy Inc.

Safety data referenced: EQ-Bench 3 and Spiral-Bench by Samuel Paech (eqbench.com). The Gemini conversation cited took place March 23, 2026, on Google’s platform. Screenshots preserved.

Transparency: This article was developed in conversation with Claude (Anthropic), which scores well on the benchmarks discussed. The data is independently verifiable. The Gemini conversation was conducted separately by the author. Factual editorial comments by ChatGPT 5.4 Thinking.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *