
Recently, Zoom announced that its AI system scored 48.1% on a benchmark called Humanity’s Last Exam, or HLE. At first, that number might not seem impressive. On a typical test, it would be a failing grade. But what makes this milestone interesting is what the exam actually measures and how Zoom’s AI achieved it.
HLE was designed to test reasoning rather than memorisation. Most AI benchmarks measure pattern recognition. If a model has seen enough examples, it can often produce a correct answer without really understanding. HLE removes that safety net. Its questions cover a broad academic spectrum, from medicine and law to literature and philosophy. They are unfamiliar on purpose, requiring multi-step reasoning, problem-solving, and justification. A model must interpret a scenario, weigh possible explanations, and defend a final conclusion. It is not enough to recall information. The exam rewards logical thinking.
What we found particularly interesting is how Zoom approached the problem. Instead of relying on one massive AI, they used multiple smaller models working together. Each model explores the problem from its own perspective, verifies the reasoning, and contributes to a final answer. Zoom calls this approach Explore, Verify, Federate. It mirrors the way we often work in clinical settings. When complex decisions arise, we collaborate with other specialists, weigh evidence, and integrate insights to reach the best conclusion. Smaller models focusing on what they do best can produce stronger reasoning than a single, oversized system.
So why should therapists care about this? While AI is still far from human-level reasoning, this breakthrough hints at tools that could support clinical work in meaningful ways. We can imagine AI helping us analyse case notes and track patterns over time. It could suggest potential therapy activities, generate personalised visual or interactive tools, or provide structured summaries during teletherapy sessions. These systems could reduce repetitive tasks and free us to focus on the human connection that drives therapy outcomes.
The federated approach also suggests AI could become more efficient and transparent. Instead of massive, opaque models, we could see networks of smaller reasoning engines that explain how they arrive at conclusions. For us, that means more trust in AI’s suggestions and better integration into multidisciplinary teams.
HLE is not a signal that AI can replace therapists. It is a step toward reasoning-focused tools that work with us. By testing AI in challenging, unfamiliar scenarios, researchers are showing that the technology can begin to reason rather than just produce fluent text. For therapists, especially in teletherapy, this opens doors to smarter support systems, more personalised client engagement, and tools that help us plan, track, and refine interventions efficiently.
We are still early in this journey, but milestones like Humanity’s Last Exam give us a glimpse into a future where AI can truly enhance our clinical practice. It won’t replace our judgment, but it can become a powerful partner in delivering thoughtful, data-informed, and engaging therapy
