AI “Safety” Isn’t the Same as Clinical Safety: What the Research Trend Means for Our Therapy Practice

A useful piece of information to keep in mind is this: many AI chatbots look “safe” in testing because they refuse obvious harmful requests, but they can still respond unsafely when the same intent is phrased indirectly. This is often described as keyword-based safety (catching flagged words) versus intent awareness (understanding what the person is actually trying to do). In other words, the model may pass safety checks by recognizing certain terms, yet fail when distress is expressed in more human, ambiguous language.

What this means for our therapy practice is immediate: our clients rarely speak in clean, explicit “risk language.” They test the waters. They minimize. They speak in metaphors. They code-switch. They communicate through tone and omission. If a tool only “detects” risk when the client uses the right words, that tool mirrors the least helpful kind of assessment, one that rewards performance and misses lived experience.

A second key reality: many models are trained to be warm, validating, and agreeable. That can feel supportive, but clinically we know validation without discernment can become reinforcement. As therapists, we validate emotion while gently challenging distortions, checking reality, and tracking function over time. An AI can unintentionally validate emotion, interpretation and impulsive plan all at once, because it’s optimized to be helpful and coherent, not to hold clinical responsibility.

Then there’s AI bias, and in therapy we should assume it shows up in ways that matter. Models can respond differently based on dialect, second-language English, culture-shaped expressions of pain, or even how “organized” a story sounds. The client who is dysregulated, repetitive, or fragmented (often highest need) may get generic reassurance, while the client who is articulate and persuasive may get more detailed, confidence-sounding answers. That is not just unfair—it can skew risk, rapport, and decision-making.

So practically, when a client tells us they’ve been using a chatbot, we don’t treat it as a quirky side detail anymore, we treat it like a new “third voice” in the system. We ask: When do you use it, before bed, after fights, during panic? What does it tend to say? Do you feel calmer, or more certain? Does it reduce shame, or does it keep you looping? That assessment gives us clinical data: the tool’s role (soothing, escalating, avoiding, rehearsing), and the client’s relationship with it (dependency, secrecy, relief, shame).

In session, this information nudges us to be more explicit about the difference between emotional validation and clinical containment. We might say: “A chatbot can sound caring and still miss what we’re tracking, risk patterns, triggers, relapse signatures, coercion, dissociation, trauma responses.” This isn’t anti-tech; it’s psychoeducation. It helps clients understand why “it felt supportive” isn’t the same as “it was safe for my nervous system and my real-life consequences.”

It also changes how we handle risk conversations. Because AI safety can be cue-based, we assume clients may have learned (without meaning to) that certain wording gets shut down and other wording gets rewarded. That can shape disclosure: clients may avoid direct language, or they may rehearse safer-sounding narratives. Practically, we make more room for graded disclosure: “If it’s hard to say plainly, can we circle it, what are the closest words you can tolerate right now?” That keeps the door open without forcing performance.

On the provider side, it pushes us to tighten boundaries and documentation when AI touches our workflow. If we use AI for drafts (handouts, summaries, exercises), we treat it like an intern: we review every line, remove anything that sounds overconfident, and check for bias-laden assumptions (culture, gender roles, family expectations, “should” language). If an organization suggests AI note-writing, our clinical question becomes: where is the data going, who can access it, and what happens if the model invents details? Clinical responsibility doesn’t outsource.

When we’re advising colleagues or a clinic, we translate all of this into simple evaluation questions: Does the tool stay safe over multiple turns, or does it drift into over-agreement? Does it respond appropriately to indirect distress? Does it treat different dialects and cultural expressions consistently? Does it have clear escalation behavior (crisis resources, “get human help”) without shaming? If a vendor can’t answer those plainly, we assume the tool is optimized for demos, not for therapy-adjacent reality.

Finally, we treat AI bias as an equity issue inside care, not a tech footnote. We build it into supervision and training: we role-play indirect phrasing, different cultural idioms of distress, and coercive-relationship narratives to see how tools might misread them. And we tell clients something grounding: “Use it if it helps, but don’t let it become your judge, your diagnosis, or your safety plan.” In practice, that stance keeps us clinically responsible while acknowledging the world our clients already live in.

Leave a Comment Cancel Reply