Learning brief
Generated by AI from multiple sources. Always verify critical information.
TL;DR
Anthropic built an AI model called Claude Mythos that's so good at hacking they won't release it publicly. It found vulnerabilities in critical infrastructure during testing and outperforms every other model at cybersecurity tasks. They're using it internally for Project Glasswing, a program to find and fix security holes in essential software before the bad guys do.
What changed
Anthropic created Claude Mythos, a general AI model with elite hacking skills — then decided not to release it.
Why it matters
First time a company built their most powerful model specifically to keep it locked down for security work.
What to watch
Whether other AI labs follow suit or race to release their own offensive-capable models anyway.
What Happened
Anthropic just announced Claude Mythos Preview, their most capable AI model to date — and immediately said you can't have it. Unlike every previous Claude release (Haiku, Sonnet, Opus), Mythos isn't going into the API or the consumer app. It stays behind Anthropic's firewall (Source 5, Source 8, Source 11).
Why? Because Mythos is disturbingly good at cybersecurity. During internal testing, it successfully identified and exploited vulnerabilities in real-world critical infrastructure systems (Source 3, Source 7). Anthropic published a 244-page system card detailing exactly how capable — and dangerous — the model is (Source 4, Source 16). This isn't a specialized hacking tool; it's a general-purpose model that happens to excel at offensive security when you ask it to.
Instead of selling access, Anthropic is deploying Mythos exclusively for Project Glasswing — an initiative to scan critical software infrastructure (think: power grids, water systems, financial networks) for security holes and fix them before attackers find them (Source 3, Source 12, Source 20). Think of it like hiring the world's best burglar to test your locks, except the burglar is an AI that never sleeps and can check thousands of systems simultaneously.
The model also leaked accidentally — sort of. Someone at Anthropic left 3,000 internal documents on a public server, including references to Mythos before the official announcement (Source 14). Separately, the Claude Code source code (Anthropic's coding assistant tool, different from Mythos) leaked when over 8,000 copies appeared online after a configuration mistake (Source 2, Source 26). Neither leak exposed the actual Mythos model weights, but they revealed enough internal details to confirm its capabilities.
Claude Mythos outperforms GPT-5, Gemini Ultra, and the current Claude Opus 4.6 on coding benchmarks — particularly security-related tasks (Source 20, Source 29). Anthropic's Python SDK added support for claude-mythos-preview as a model identifier, but only for internal/research use (Source 27).
So What?
This is the first major AI lab explicitly choosing security over market share. Every previous frontier model launch followed the same pattern: build the best thing you can, then race to get it into customers' hands before your competitors do. Anthropic just broke that cycle. They built their most capable model and locked it down because releasing it would hand offensive capabilities to anyone with an API key.
The uncomfortable truth is: Mythos probably represents where all frontier models are headed. As AI systems get better at reasoning and code execution, they inherently get better at finding and exploiting software vulnerabilities. Anthropic isn't special here — they just said the quiet part out loud. OpenAI, Google, and Meta are likely facing the same calculus: do you release a model that could automate zero-day exploit discovery, or do you sit on it? For now, Anthropic chose option two.
Project Glasswing matters more than the model itself. If this works — if Anthropic can actually use Mythos to identify and patch critical infrastructure vulnerabilities faster than attackers can find them — it flips the security equation. Right now, defenders are always one step behind. An AI that can continuously scan for weaknesses 24/7 across thousands of systems could finally give defenders an edge. The risk? If Mythos (or a similar model) ever leaks or gets stolen, attackers get the same advantage.
Now What?
**If you run critical infrastructure or enterprise software:** Watch for announcements about Project Glasswing partnerships. Anthropic hasn't said which organizations are participating, but if your sector handles power, water, finance, or healthcare, your security team should be tracking this.
**If you're a developer using Claude:** Nothing changes for you. The current Claude 3.5 Sonnet and Opus models remain available via API at anthropic.com/api. Mythos isn't replacing them; it's a separate research track.
**If you manage AI security policies at your company:** This sets a precedent. Start defining internal criteria for when a model is "too capable" to deploy publicly. Anthropic's 244-page system card (available at anthropic.com) offers a framework.
**If you're worried about AI-powered cyberattacks:** The cat's already out of the bag — multiple research papers show GPT-4 and Claude 3 can find simple vulnerabilities. The question isn't whether AI will be used for offensive security (it already is), but whether defensive AI can keep pace. Project Glasswing is the first serious attempt to find out.
Sources