Two AI Models Set to “stir government urgency”, But Will This Challenge Undo Them?

AI ExplainedMarch 26, 2026

📝Summary

Upcoming AI model releases from **OpenAI** and **Anthropic** are poised to significantly advance AI capabilities, potentially prompting urgent government action. **OpenAI** is reportedly prioritizing its new **Spud** model, even pausing development of its **Sora** app to allocate computing resources, aiming for a single, integrated AI super app. Meanwhile, **Anthropic**'s next-generation **Claude** models are generating renewed interest from the Pentagon, with warnings that these advancements could bolster both offensive and defensive cyber capabilities, potentially leading to a revived government deal.

A new benchmark, **ARC AGI 3**, has been introduced, highlighting a substantial gap between current AI models and human performance. In this benchmark, which tests abstract reasoning, planning, and memory without relying on language or cultural cues, leading AI models score less than **0.5%**, while humans achieve **100%**. This stark contrast challenges recent claims of achieving artificial general intelligence. The benchmark's design emphasizes efficiency and learning, penalizing excessive actions and capping performance at the human baseline, suggesting that even future advanced models may not be considered true AGI solely based on this metric.

The transcript also touches on OpenAI's long-term goal of developing a fully automated AI researcher, envisioning a future where AI handles complex problem-solving and humans primarily review outputs. However, it cautions against expecting immediate exponential speedups, citing current AI performance limitations even in drafting economically valuable tasks. The potential risks of unmonitored AI agency are also highlighted, referencing a recent exploit of a Python library, underscoring the need for robust oversight and layered security for AI systems.

📜Full Transcript(23 sections • 16,577 chars)

Two exclusive reports indicate that there will be a qualitative leap in AI performance from each of the next AI models released by OpenAI and Anthropic. For OpenAI, this has meant shutting down the Sora app to spare computing resources for the new Spud model. And for Anthropic, makers of Claude, it has meant renewed interest from the Pentagon in reviving a deal to use Claude beyond the six-month deadline recently set by the US government. But this video will also dive into a brand new benchmark sure to be the talk of 2026, Arc AGI 3.

I've read the paper in full, but the headline result is that humans get 100% while the best AI models currently get less than half a percent. That might or might not be news to Jensen Huang, CEO of Nvidia, who this week said that artificial general intelligence has already been achieved. Let's start though with OpenAI's erotica bot because the news there is that that erotic chatbot is not coming out. After presumably spending billions optimizing it for engagement, it has been shelved. Apparently, according to the Financial Times, which is always my source for sexbot rumors, OpenAI needs the compute for Spud. It needs to drop its other side quests to focus on AGI deployment.

Rolling up everything it does into one super app. Jumping across to the information, apparently even OpenAI employees had complained that Sora with its viral AI videos was still just a drag on the company's computing resources. In contrast, the spod model is apparently very strong according to Samman. It will be ready in a few weeks and it will really accelerate the economy. I know at this point some of you will be rolling your eyes going, "Oh, he would say that." But this article was a strange echo of one I had just read in Axios about Anthropic and their Claude series. Here's the key paragraph that I don't think many people have noticed about the new Claude series. Anthropic have warned US government officials that the next big advance will supercharge both offensive and defensive cyber capabilities. It