Data exfiltration via Markdown image tags
Markdown image exfiltration leaks secrets from AI chat UIs through a single image tag in attacker-controlled content. The attack still works in 2026.
Written by
Mack Chi
Markdown image exfiltration in AI chat UIs
Markdown image exfiltration is the simplest data leak in modern AI chat products: a chat client renders Markdown, an attacker plants one image tag inside untrusted content, and the secrets in the model's context window get smuggled out through an HTTP request the browser makes on the user's behalf. No tool calls, no clicks, no obvious UI. The render is the exfiltration. Markdown image exfiltration bypasses every guardrail that lives inside the model — SSO, audit logs, tool allow-lists, careful system prompts — because the leak channel is the chat UI itself, not the agent.
The attack is well-documented. Simon Willison has been describing Markdown image exfiltration since at least 2023, and his prompt injection design patterns post calls it out by name. It still ships in new products every quarter.
The classic payload
The shape has not changed in three years. Somewhere in the model's context — a wiki page, a customer support ticket, a PDF attached to an email, the body of a Jira comment — sits one line:

The model is asked, politely or otherwise, to "include the loading indicator at the end of your reply." It does. The chat client renders Markdown, sees an image tag, and makes an HTTP request to
attacker.example/leak?token=sk-.... The image 404s. The user sees a tiny broken-image icon for half a second, if that. The attacker reads the access log.The technique stretches in every direction. The URL can carry an entire
Authorization: Bearer header value, the last paragraph of a confidential doc, or "every email subject from the last hour" — anything the model just had in its context window. The only constraints are URL length (forgiving) and the model's willingness to substitute the secret into a placeholder (very forgiving, by default). The image can also hide inside plausible-looking Markdown: a logo at the top of a generated report, a "click to retry" thumbnail, a "powered by" badge. Users will not read the raw output. The renderer will.For background on the broader pattern, see the earlier posts on prompt injection and the lethal trifecta. Markdown image exfiltration is the most boring possible instance of both. Untrusted content tells the agent what to write. The agent writes a string. The chat UI is the external-communication leg of the trifecta — no
send_email tool required, because the UI is already happy to send the HTTP request.Why markdown image exfiltration still works in 2026
The attack is old. Real products have been bitten by it publicly — ChatGPT, Copilot, Notion AI, Slack AI, GitHub Copilot Chat have all shipped some variant of a fix at some point, usually after a researcher dropped a PoC. And yet, every other quarter, a new AI chat surface ships, Markdown rendering gets turned on because the designer wanted nice headings, and the bug is rediscovered from scratch.
The blunt version: an AI chat product that renders Markdown without an image proxy and an outbound-URL allow-list is a data exfiltration channel with a chat interface bolted on. This should be table-stakes. It is not. Reproducing markdown image exfiltration against popular chat clients every few months still succeeds often enough that the result has stopped being surprising.
The reason it keeps happening is that the leak does not look like a security feature in the design doc. "We render Markdown" is a one-line product decision. "We render Markdown but route every image through a proxy that strips query strings, blocks redirects, and only fetches from domains on this allow-list" is an entire mini-project that nobody ships in week one. The first version is the version that survives. The first version is the bug.
The mitigation that actually works
There are bad mitigations and a good one.
The bad mitigations all live inside the model. "Tell the model not to emit image tags with secrets in the URL." "Detect images at output time and ask the model whether they look safe." "Refuse to render any URL with a query string longer than N." All of these lose the same way: an attacker who controls untrusted content in the context controls the model's output well enough to slip around any string-level filter. The model is the thing under attack. It cannot be the thing deciding whether the attack succeeds. (See the dual-LLM pattern for why model-internal defenses fail this class of attack.)
The good mitigation is structural and lives in the chat client, not the model:
- Never let the browser fetch image URLs from model output directly. Route every image through a server-side proxy under the product's control.
- Allow-list the domains the proxy will fetch from. The product's own CDN, a small list of known content hosts, nothing else. Attacker domains do not get to be on this list, which is the whole point.
- Strip query strings and fragments before fetching. If the legitimate use case is "render a logo," the legitimate use case does not need
?token=.... Drop it. - Serve the bytes back to the user from the proxy. The attacker's server never sees the user's IP, headers, or — critically — the URL the model tried to construct.
A chat client built this way can keep rendering Markdown, keep showing inline images, keep being pleasant. The exfiltration channel just closes. Untrusted content can still tell the model to write whatever URL it wants. The URL goes nowhere.
The structural version of the same idea, applied inside Archestra: untrusted tool output never reaches the model that holds the secrets in the first place. The dual-LLM pattern puts the suspicious content in front of a quarantined model with no tools and no ability to influence the main agent's outputs except through structured answers. A quarantined model cannot smuggle a secret into a Markdown image URL because it never had the secret and never wrote the URL. The chat-client proxy is the last line; dual-LLM is the first. Both are required.
If shipping an AI chat product, ten minutes with the renderer is enough to confirm the bug. If the image URL goes straight to the browser, markdown image exfiltration works. Fix it before the next stranger does.
