The dual-LLM pattern in 200 words

The dual-LLM pattern explained short: one privileged planner with tools, one quarantined worker that reads untrusted input, a strict channel between them.

Written by

Mack Chi

The dual-LLM pattern in 200 words

The dual-LLM pattern splits an AI agent into two cooperating models so prompt injection cannot reach the tools. A privileged planner LLM holds user instructions, conversation history, and tool access. A quarantined worker LLM reads all untrusted content and has no tools at all. A strict structured channel connects them. That is the dual-LLM pattern, and that is what breaks the lethal trifecta. The long version covers the game-theoretic intuition, a real attack walkthrough, and implementation notes. This note is the one-pager for procurement reviews, security questionnaires, and architecture diagrams.

In simpler terms: two AIs. One plans the work and holds the permissions. The other reads the scary letter from a stranger and tells the first one, in a tiny structured note, what the letter said. The planner never reads the stranger's letter. The stranger cannot talk to the planner directly. That is the dual-LLM pattern.

The pattern, in five bullets

Planner LLM (privileged). Holds the user's instructions, the conversation history, and the tools. It is the only component that ever calls send_email, writes to a database, or opens a pull request. It never sees raw untrusted content.
Worker LLM (quarantined). Reads the untrusted stuff — the email body, the GitHub issue, the PDF a vendor sent, the web page the agent fetched. It has zero tools. It cannot send, write, post, or call anything. If it were possessed by a demon it would still be inert.
Channel between them. The planner asks structured questions. The worker answers in a tight schema — {"is_bug_report": true}, {"category": 2}, a short string slot the planner explicitly asked for. No free-form text passes from worker to planner. Injection cannot ride a JSON boolean.
Forbidden moves. The worker cannot summarize freely into the planner's context. The planner cannot paste untrusted content into its own prompt as a shortcut. Both rules are enforced by the proxy, not by asking the model nicely.
What gets logged. Every question the planner asked, every answer the worker gave, and the original untrusted blob the worker saw. When something goes sideways later, that tape gets replayed — not the planner's reconstructed reasoning.

That is the whole pattern. The reason the dual-LLM pattern works against the lethal trifecta is structural, not clever: the model with the tools never reads the attacker's text, and the model that reads the attacker's text has no tools to weaponize.

One strong opinion

The dual-LLM pattern is not a silver bullet. It is a discipline. Plenty of vendors claim "we have dual-LLM" when what they actually mean is "a second model summarizes the dangerous content and pastes the summary into the main context." That is not dual-LLM. That is dual-LLM cosplay. The summary is free-form text written by a model that just read the attacker's instructions, and it lands directly inside the privileged loop. Congratulations, the injection now travels through a slightly longer pipe.

The discipline is the four "no"s: no free-form worker output reaching the planner, no untrusted content reaching the planner through any side door, no tool access on the worker ever, and no human-readable summary used as a control signal. If any of those four leak, the architecture is a story, not a defense.

When to use it, when not to

Use the dual-LLM pattern when an agent has to touch all three legs of the trifecta — private data, untrusted input, external action — and none of the legs can be dropped. That is the case it was built for. Customer support triage that reads inbound tickets and also writes to a CRM. A coding agent that reads GitHub issues and also opens PRs. Anything where "just make the agent read-only" is not an acceptable answer.

Skip it when one of the legs is already missing. If the agent is genuinely read-only, or if the untrusted content is genuinely absent, the overhead is not worth it. Dual-LLM costs tokens, latency, and a question-asking loop that is more rigid than free chat. Pay that cost only when the triangle would otherwise close.

For the long version — the Guess Who analogy, the failure modes, the test against a real GitHub MCP exploit — see the original post.