
Imagine building the most powerful technology on Earth—and not quite knowing how it works. That’s the reality facing top AI companies today, and Anthropic CEO Dario Amodei is calling time on it.
In a candid new essay, Amodei admits what many in the AI world have quietly known: we don’t fully understand how large AI models think, make decisions, or even why they sometimes go rogue with false answers. It’s like having a super-genius in your pocket who helps 95% of the time—but occasionally lies or makes wildly wrong assumptions, and you have no idea why.
And as these systems become smarter—what Amodei describes as “a country of geniuses in a data center”—that mystery becomes a real problem. “I am very concerned about deploying such systems without a better handle on interpretability,” he wrote. Fair. Because when these tools start driving decisions in finance, healthcare, or national security, ignorance isn’t bliss—it’s risky.
Amodei has now made interpretability—the ability to understand and diagnose AI behavior reliably—a company mission. Anthropic’s goal? By 2027, they want to detect and explain most major AI issues before they hit the real world.
And they’ve already made some headway. Anthropic researchers recently identified “circuits”—tiny logic patterns in AI models—that help explain how a model knows which U.S. cities belong in which states. It’s like finding traces of the model’s “thought process.” But there are millions of these circuits, and they’ve only mapped a handful.
It’s not just about safety, either. Amodei argues that understanding how AI thinks could eventually be a commercial edge, especially as models get more autonomous and less predictable.
He’s also nudging competitors like OpenAI and Google DeepMind to step up and invest more in transparency. And he’s calling on regulators to require safety disclosures from AI builders, not to slow them down, but to ensure we’re building systems we can trust.
Because in the rush to build smarter machines, the real win might not be more intelligence—but more understanding.