The Containment Gap

Anthropic trained its most capable model, then chose not to release it. The reasons matter for everyone building with AI.

The release constraint

Anthropic did not handle Claude Mythos Preview like a normal model launch. Instead of broad access, they placed it inside Project Glasswing, a restricted program for defensive cybersecurity work. That decision says more than the performance claims do. It suggests they saw a capability profile that did not fit ordinary release mechanics. Another way of saying: the control problem is starting to press harder than the product story.

Anthropic describes Mythos Preview as an unreleased frontier model. They say it has already found thousands of high-severity vulnerabilities, including in every major operating system and every major web browser. They also say those capabilities emerged from broader gains in coding, reasoning, and autonomy rather than from narrow exploit training. That part is easy to miss, but it is probably the more important one. A model that reaches this level through general capability gains is a different kind of governance problem than one built for a narrow offensive specialty.

There is a tendency to treat this kind of release as a cybersecurity story and leave it there. Too narrow. The cyber angle is what made the constraint visible. The more interesting point is that a frontier lab built a highly capable general-purpose model, saw a specific capability cluster emerge, and decided ordinary release was the wrong posture. That should get more attention than the usual benchmark chatter, because it suggests the release decision itself is starting to carry safety meaning.

Better models, harder deployment

This is where a lot of the usual AI safety language starts to feel a little too neat. “Better aligned” and “safer to use” are not interchangeable ideas. A model can become more reliable in ordinary use and still create a harder deployment problem, because organizations respond to competence in predictable ways. They expand access. They reduce friction. They wire the model into workflows that sit closer to real authority. At that point the question shifts. The issue is no longer whether average behavior improved. The issue is whether the surrounding system is built to contain the failure modes that come with greater autonomy.

Anthropic’s own material points in that direction whether they intended it as the main lesson or not. The interesting signal is not just that Mythos performs well on cyber tasks. It is that Anthropic chose a constrained release posture around it. Companies do not narrow distribution like this because a model is merely impressive. They do it because they are no longer comfortable treating release as a straightforward product decision.

There is also a practical asymmetry that does not get enough attention. Stronger models lower the operating cost of trying things. They make exploration cheaper. They make chaining easier. They make persistence easier. That is part of what makes them useful. It also means the organization using them can drift into a much more consequential deployment pattern without ever making a single dramatic decision. No one has to say “we are now taking a large governance risk.” They just keep removing friction because the system keeps looking capable enough to deserve it.

Why cybersecurity shows the problem first

It is easy to read this as a special case because the headline involves zero-days, browsers, kernels, and critical infrastructure. The narrower reading is that Anthropic built a very strong cyber model and decided to contain it. The broader reading is that general frontier capability is beginning to produce operational profiles that no longer fit the assumptions behind ordinary deployment.

The broader reading is the more useful one.

Cybersecurity is simply where the problem becomes legible sooner. When a model is finding exploitable weaknesses, moving across technical boundaries, or interacting with sensitive infrastructure, the stakes are obvious enough that even the optimistic reading starts to wobble. People recognize more quickly that capability without tight constraint is not a deployment strategy. It is a faster path to consequences.

The same basic issue exists in less dramatic enterprise settings. It is there when a model sits inside a long business workflow with access to internal tools. It is there when an agent can move from retrieval to action with very little human interruption. It is there when the system can touch identity systems, tickets, approvals, payments, code, customer records, or production infrastructure. Cyber is not the whole story. It is the place where the story gets harder to ignore.

A lot of enterprise discussion still treats “agentic AI” as though the main challenge is deciding what the agent should do. In practice, the harder question is often what the surrounding environment allows it to do while it is wrong, confused, manipulated, or simply more enterprising than anyone intended. The difference between those two questions is most of the governance problem.

Where governance actually holds

A lot of enterprise governance still assumes the important work happens upstream. Write the policy. Define the process. Decide what is allowed. Review the exceptions. That was already incomplete before agents started moving across tools, systems, and permissions with less friction. It looks thinner now.

Comparison of upstream governance (policy, process, review) versus operational runtime constraints (identity, access, permissions, audit, intervention)

The real test is further downstream: in the environment the model is actually operating inside, and in the constraints that hold when it tries to do something it should not.

The distinction is fairly simple. There is a difference between a system where an action is against policy and a system where the action cannot happen without tripping something real. Credentials are separated. Access is brokered. Permissions are narrow. Logs are durable. There is a credible way to stop the thing when it starts moving in the wrong direction. That is a different posture from telling yourself the governance is in place because the documentation is.

Plenty of organizations have principles, review boards, acceptable-use language, and a growing pile of AI standards. Fewer have done the less glamorous work of making identity propagation precise, making tool access mediated rather than implicit, bounding runtime privileges, preserving trustworthy audit trails, and deciding in advance what conditions trigger intervention. Those are the controls that carry weight when a system is operating at speed. The rest is helpful, but it does not get the last word.

Anthropic’s Responsible Scaling Policy is one example of the industry trying to move toward more explicit capability thresholds and deployment safeguards. That kind of framework is useful, but it also underlines where things are headed. The conversation is moving away from general statements of intent and toward evidence that constraints are real, specific, and proportionate to the capability being deployed.

The gap

Chart showing capability growth outpacing organizational containment capacity over time — the widening containment gap

Epoch’s benchmark work is trying to do something similar on the measurement side. The Epoch Capabilities Index is an attempt to make capability comparisons more coherent across models and time. That helps with visibility. It does not solve the harder operational problem. The industry is getting better at describing the curve. I am not sure it is getting better fast enough at living under it.

That is why Glasswing matters beyond Anthropic. Eventually, models with a similar profile are unlikely to remain confined to one restricted access program. Anthropic says Glasswing is intended both to help secure critical software now and to prepare the industry for the practices that will be needed to stay ahead of attackers. That sounds less like a one-off product choice than an early acknowledgment that this capability class changes the deployment question.

When that happens, the center of gravity shifts. The question is no longer whether a lab chose a cautious release posture. The question is whether the organizations deploying systems like this have built environments that deserve them.

Most have not.

The problem is not that enterprises lack awareness. It is that too many of the controls are still conceptual, still upstream, still dependent on orderly behavior. The systems arriving now are putting pressure on that entire style of governance.

That is the containment gap. Capability is moving into environments that were not designed for this level of autonomy, reach, or speed. The gap is not between what the models can do and what we can imagine. It is between what the models can do and what most organizations can reliably constrain once those models are attached to real systems.

Sources

Anthropic, “Project Glasswing” announcement, April 2026

Anthropic, “Claude Mythos Preview System Card,” April 7, 2026

Anthropic, Responsible Scaling Policy v3.1

Epoch AI, “A Rosetta Stone for AI Benchmarks” / Epoch Capabilities Index