Deep Dive · XiaoHu Explains

Anthropic Releases an AI Agent Architecture Guide: Ask Three Questions Before You Decide to Go Multi-Agent

Multi-agent systems can boost performance by 90.2%, but they also cost 10-15x more in tokens. This three-question framework helps you think it through before you commit to a complex architecture.

60-Second Overview

Anthropic released an enterprise-grade AI Agent architecture guide, drawing on real-world deployments from customers like Coinbase, Intercom, and Thomson Reuters to lay out architecture patterns and a selection method.
The guide lays out six core architectures: single agent, hierarchical (supervisor), collaborative (peer-to-peer), sequential workflow, parallel workflow, and evaluator-optimizer workflow.
Anthropic's internal research shows that for complex tasks requiring simultaneous exploration of multiple independent paths, multi-agent systems outperform single agents by 90.2% — but consume 10-15x the tokens of a single agent.
The guide offers a three-question decision framework (how much control do you need, how many domains does the problem span, what's your resource budget) to help you figure out which architecture fits — instead of defaulting to the most complex option.
The core method is "start simple, evolve gradually": validate value with a single agent first, then upgrade based on business needs and data feedback, rather than building a complex system from day one.

⚑This is an official enterprise promotional guide from Anthropic — essentially a vendor white paper with a sales angle built in. The 90.2% and 10-15x token cost figures cited come from Anthropic's internal research; the 99.99%, 20x, 86%, and similar figures come from self-reported customer disclosures — none of it has been independently verified by a third party. What follows records the methods and data as presented in the original, without vouching for their accuracy.

1What This Is

An Enterprise-Grade Agent Architecture Handbook from Anthropic

Anthropic recently released an AI Agent architecture guide aimed at enterprises, pulling together real-world deployment cases from customers like Coinbase, Intercom, and Thomson Reuters, and laying out six architecture patterns plus a decision framework for choosing between them.

The white paper's opening line makes the positioning clear right away: generative AI answers questions, AI agents solve problems. And this handbook is here to answer the next, more practical question: for a given business, should you go with a single AI agent working solo, or build out a full multi-agent system? It breaks that choice down into an actionable decision framework, backing every architecture with quantified data from real customers.

⚑

Why it's worth reading: the guide surfaces one key trade-off — for complex tasks requiring simultaneous exploration of multiple independent directions, multi-agent systems outperform single agents by 90.2%; but they consume 10-15x the tokens of a single agent. This one trade-off, rising and falling together, is the foundation for every selection judgment in the entire piece.

Let's get one thing straight first: AI agents work very differently from ordinary automation scripts. Traditional automation has to hard-code every step in advance — the process follows the script step by step, and it gets stuck the moment it hits a scenario nobody planned for. An agent's approach is to assess the task, pick the right tool, try an approach, check whether the result is good, and adjust — running that loop until the job is done.

Traditional Automation

Every step has to be pre-written. The process is fixed and traceable, but it can only follow the path laid out in advance — it breaks the moment it hits an edge case.

AI Agent

Given a task, it plans on its own: reads the question, checks account history, searches the knowledge base, drafts a reply, pulls in a human when needed — adjusting its strategy dynamically based on intermediate results throughout.

The guide opens with a batch of customer results, used to show what these systems have actually delivered in real production environments.

99.99%

The service availability Coinbase's Claude-powered customer service agent maintained under $226 billion in quarterly transaction volume

100x

The delivery-time improvement Tines achieved by collapsing a multi-step security operations process into a single agent operation

20x

The speedup Inscribe's fraud-review agent achieved by cutting review time from 30 minutes to 90 seconds

86%

The resolution-rate ceiling Intercom's Fin AI agent reached across more than 25,000 customers

What matters about these four numbers isn't their size — it's their spread: customer service, security operations, fraud review, and cross-industry resolution rates. Four completely different jobs, all held up by agents in production. What they prove is "worth investing in" — they don't yet answer "which architecture should you pick." The guide also offers an example closer to everyday operations: a retail bank using an agent to process credit risk memos. Where a relationship manager used to spend weeks manually cross-checking ten data sources, the agent now delivers a 20% to 60% productivity boost and cuts credit turnaround time by 30%. These numbers set up a premise — agents are worth the investment — but which architecture you use to build them is what this guide is really here to teach. It covers six architectures total, which really fall into three families: ones where the AI makes its own decisions (single agent, hierarchical supervisor, collaborative), ones where the process is scripted by humans in advance (sequential, parallel workflows), and one dedicated to quality control (evaluator-optimizer). Let's walk through them one by one, from simplest to most complex.

2Start With the Simplest Architecture

Step One: Check Whether a Single Agent Is Enough

The first principle the guide keeps hammering is "start simple." Don't jump straight to a complex system — check first whether a single agent can solve the problem. It's cheaper, easier to debug, and its metrics map more directly onto business outcomes.

A single-agent system is one AI agent running a continuous loop: perceive the environment, decide the next step, execute it, until the task is done or it hits a stopping condition like "pause for human review." Its core is built from a handful of components.

Single-agent architecture: one AI model at the center, connected downward to three capability components, the whole thing running inside a perceive-decide-act loop

Of the three components in the diagram, two are specialized terms worth explaining here.

MCP (Model Context Protocol)

A standard interface that lets an agent connect to external systems (databases, web search, internal tools). It's what actually lets the agent go query data and operate tools, instead of just chatting. Think of it as fitting the AI with a universal adapter — whether it's plugging into a database or a search engine, the interface is standardized, so you don't need to build custom integration code every time.

Agent Skills

Packages the specialized knowledge, standard procedures, and tool usage for a given domain into a module the agent can call when needed, instead of stuffing all that domain expertise into the prompt. It's like equipping the agent with a set of specialized toolboxes — pull out the legal toolbox for legal questions, swap in the finance one for finance questions — so one agent doesn't need every domain's knowledge crammed into its head at once.

When to Use It / When Not To

Good fit: open-ended problems where you can't tell upfront how the path will unfold — you don't know how many steps it'll take or what obstacles it'll hit — so you let the agent adjust as it goes.

Not a fit: scenarios that require a perfect answer on the first try. Here a single agent falls short — either add specialized Skills to boost accuracy, or only then consider going multi-agent. Before upgrading, think it through: would adding skills to the single agent already be enough?

Expand: how a research agent completes a complex query in 8 steps

An employee asks: "Research what remote-work productivity tools the engineering team is adopting, and see if any of them correlate with our internal productivity metrics." This agent, connected via MCP to a content library, business tools, and the dev environment, handles it like this:

① Receive question → ② Initial analysis: determines this needs two data sources — external tool research and internal company metrics — and that the external research doesn't initially depend on internal data, so it can be split into parallel searches with correlation done at the end.
③ Activate skills: calls on research methodology, data correlation, and business insight skills, applying proven frameworks instead of reasoning from scratch.
④ Task decomposition and planning: settles on a plan: external web search, internal database query, parallel execution, then correlation and synthesis.
⑤ Parallel tool calls: runs a web search (for productivity tool trends) and a SQL database query (for internal productivity metrics) at the same time, with both tools running concurrently to cut total time significantly.
⑥ Iterative refinement: reviews the initial results, finds it needs more specific engineering-team data, and issues a more targeted follow-up query based on that.
⑦ Correlate and synthesize → ⑧ Generate result: cross-references external trends with internal metrics to produce an integrated conclusion.

3The Math You Need to Do First

When It Stops Working: Signs You Should Go Multi-Agent

Multi-agent only enters the picture once a single agent hits a fundamental ceiling. The guide gives three clear trigger signals — but it also puts the cost right on the table.

Going multi-agent means: breaking a task apart, handing pieces to multiple agents each with their own specialty, and synthesizing their results into one coherent answer once they're done. The core figure from Anthropic's internal research: for complex tasks requiring simultaneous exploration of multiple independent directions, multi-agent systems outperform single agents by 90.2%. Its judgment is that once intelligence crosses a certain threshold, "multi-agent systems become the key path to scaling performance" — because what a group of agents can accomplish far exceeds what a single one can, just like human organizations.

Performance gain
(complex tasks)

+90.2%

Token consumption
(vs. single agent)

10-15×

Put these two bars side by side and you get the whole piece's north star: performance really can jump substantially, but cost multiplies right along with it. Also keep a close eye on where that 90.2% applies — this is the number most likely to get quoted out of context in the whole book: it was measured only on tasks that "require simultaneously exploring multiple independent directions." Don't use it to vouch for every scenario. So the warning is blunt too: simple queries shouldn't trigger an expensive multi-agent pipeline — the system needs to be able to scale its investment dynamically based on task value.

Three Signals That Say It's Time for Multi-Agent

① Open-ended problems: it's very hard to predict how many steps it'll take, and you need to change direction mid-investigation and explore side branches at any point.

② The task spans multiple unrelated domains: research shows that when a single task mixes in two or more unrelated specialty domains (say, both reviewing legal terms and modeling finances), a single agent starts dropping the ball, and performance falls off sharply.

③ Broad parallelism: the problem requires pursuing multiple independent directions at once, and parallel processing delivers a substantial performance gain.

The cost isn't just tokens — multi-agent also makes debugging much harder. Traditional debugging methods stop working here, because agents make dynamic decisions and every run is non-deterministic. As decisions multiply, you need tracing that can capture how agents communicate, delegate, and synthesize results with each other — otherwise, the moment coordination goes wrong, it's nearly impossible to troubleshoot.

4Core Pattern One

Hierarchical Supervision: One Manager, Several Specialists

Once you've decided to go multi-agent, the first question to answer is how to organize the group: with someone in charge, or without. The version with someone in charge is called hierarchical supervision, and it's also the most common multi-agent pattern: a central supervisor analyzes the request, assigns tasks to the appropriate specialist agents, then synthesizes their results — forming a clear chain of accountability.

The key mechanism: in a hierarchical system, each specialist sub-agent is treated as a "tool," and the supervisor decides which specialist to invoke via tool calls. This mirrors how an efficient human team operates — specialists focus on their own domain, while the coordinator handles delegation and integration. Sub-agents can themselves have their own sub-agents, and those lower layers are hidden from the supervisor above — the supervisor only deals with the sub-team's lead, unaware of how many layers exist underneath.

Let's walk through the flow with a marketing-team example.

The marketing director agent assigns tasks to four specialists; each reports back once done, and the director synthesizes and approves a complete campaign — the specialists never talk to each other directly

This pattern's biggest pitfall is context management, and it's worth calling out on its own.

Key Challenge: Context Piles Up Into a Mess

As a system runs longer, conversation history and intermediate results fill up the AI's "brain capacity" (its context window), showing up as window overflow, degraded reasoning, and coordination breakdowns between agents. There are three fixes:

Context editing: as it approaches the token limit, automatically clean up stale tool-call records and results while keeping the conversation flow intact.
Memory tools: store information in a file system outside the context window, reading it back when needed, and retaining it across sessions.
Built-in tool throttling: add pagination, range selection, filtering, and truncation to tools, using sensible defaults to cap single-response size at a manageable level (say, around 25,000 tokens), preventing the context from getting overwhelmed.

The guide also acknowledges this pattern uses more tokens, but its judgment is that the math works out: for high-value complex tasks that need specialized knowledge or exceed a single agent's context limits, the performance gain is worth the cost.

5Core Pattern Two

Collaborative: Agents Talk to Each Other to Solve Problems

This is the "no one in charge" version, and it's also the one most easily confused with hierarchical supervision. The core difference is exactly one thing: nobody manages anybody. Coordination emerges organically from the agents' interactions, rather than being imposed by a central authority.

In a collaborative system, agents are peer, autonomous individuals that communicate directly point-to-point, dynamically negotiate their own roles, and jointly work through complex problems. It reaches consensus through three mechanisms.

Competitive intelligence case: pricing, product, marketing, finance, social media, and strategic intel — six peer agents sharing findings and cross-validating in real time, with no central command

Group-chat-style discussion: multiple agents share a common conversation thread, working through problems, making decisions, and cross-checking each other as they go — like a project group chat.

Event-driven coordination: uses "events" as the shared language — every action gets broadcast as a structured update, and other agents pick up work and sync context based on it.

Blackboard-style shared knowledge base: sets up a central "blackboard" that all agents can read and write to — whoever makes a discovery posts it there, serving as collective memory.

Key Challenge: Unpredictable Behavior

The collaborative pattern's risk grows directly out of its freedom. Frequent inter-agent communication drives up compute cost and complexity; worse, it can produce "emergent behavior" — behavior nobody specifically programmed but that arises on its own — where a small change can unpredictably ripple through how the whole system behaves. The way to handle this isn't to issue rigid commands but to set up a solid collaboration framework: define roles, problem-solving approaches, and budget, while preventing agents from endlessly punting tasks to each other, and building in conflict-resolution mechanisms.

Lay these two multi-agent patterns side by side and the difference becomes clear at a glance. Click the tabs below to toggle between the two structures.

Hierarchical (tree)Collaborative (mesh)

A tree-shaped chain of command. One supervisor at the top, specialists below — tasks flow down, results flow back up.

Clear accountability chain, auditable and traceable
The supervisor can enforce business rules while keeping oversight in place
Good fit for medium-control scenarios needing both flexibility and oversight — customer service, content creation, data analysis

A mesh of peer connections. Agents are all equals, and any of them can talk directly to any other — consensus emerges from discussion among themselves.

No central decision-maker — coordination emerges from interaction
Behavior is harder to predict, which in exploratory scenarios is a feature, not a bug
Good fit for low-control-requirement scenarios like research, brainstorming, complex analysis

6Two Hard-Coded Workflows

Sequential vs. Parallel: Should Tasks Queue Up or Run at Once

The patterns above involve dynamic decision-making by agents; the next two are pre-scripted, static workflow orchestration (agentic workflows), defining what order agents execute tasks in and under what conditions. There's exactly one core difference: do tasks queue up, or run at the same time.

	Sequential Workflow	Parallel Workflow
How it runs	Executes in a single line, step by step — each step's output feeds the next step's input	Multiple lines execute at the same time, each producing its own result independently before being aggregated
Core value	Trades waiting time for accuracy: spend a bit more time in exchange for each AI call becoming a more focused, less error-prone simple task; the process is predictable, its cost estimable, and it can be debugged stage by stage	What it's for is speed and multiple perspectives: several tracks run at once, covering the problem from different angles, giving the conclusion more grounding
When to use it	The task can be cleanly split into fixed subtasks with clear linear dependencies, and needs an audit trail (compliance checks, approval chains, draft→review→publish)	Subtasks are independent and can run concurrently, multiple perspectives improve quality, and speed matters more than coordination overhead (risk assessment needing multiple perspectives, guardrail review: one model does the work while another specifically screens for inappropriate content, multi-path parallel evaluation)
When not to use it	A single agent can already handle it in just a few steps; agents need to collaborate rather than hand off; the process needs to loop back and iterate	Agents need to build on each other's output; a specific execution order is required; shared state needs modification without a conflict-resolution strategy; overly complex result-aggregation logic would actually hurt quality

Each workflow type comes with a real example. The sequential one is a data-science insight pipeline: an analysis request first goes to a scoping agent (determines analysis type, routes it), then to a data engineering agent (extracts and cleans data), then to an analysis agent (runs statistical modeling, or flags complex requests for human review), and finally to human review. The parallel one is financial risk assessment.

Parallel workflow: credit, market, operational, and compliance risk agents all run at once, each producing its own score, and a decision engine synthesizes them with weights based on institutional policy

7Letting the Work Get Polished Repeatedly

Evaluator-Optimizer: Getting AI to Grade Its Own Homework

This is the final workflow — two AIs split the labor, refining quality to a target standard through repeated cycles.

Evaluator-optimizer

One AI generates content, another specifically finds flaws and gives actionable feedback, cycling back and forth until it meets the bar — instead of producing something once and calling it done. Like a writer paired with an editor, where the editor keeps sending drafts back for revision until it's finally approved for publication.

Here's how the API documentation generation case runs: a generation agent analyzes the codebase and writes a first draft (endpoint descriptions, parameters, examples, auth requirements); a technical evaluation agent checks it line by line against the actual code implementation (are parameter types right? is endpoint coverage complete? do the examples actually run?), sends issues back, and the generation agent absorbs the feedback and revises.

Generation agent produces a draft → evaluation agent checks it and sends it back → generation agent absorbs the feedback and revises, typically converging to publishable quality in 2-4 rounds

Where It Applies

Good fit: scenarios with clear evaluation criteria where iterative polishing delivers visible value — literary translation, code generation with safety requirements, professional documents needing exact tone, or research tasks needing multi-step reasoning plus verification.

Not a fit: when the first draft already hits the bar, when evaluation criteria are subjective and vague, or when time cost outweighs the quality gain; also skip it for real-time applications needing instant response, simple tasks like basic classification, environments with tight token budgets, and cases where the evaluator itself lacks the domain expertise to give meaningful feedback.

8Ready to Use

The Three-Question Framework + a Real Evolution Path

Let's straighten out the chain we've walked through: if a single agent is enough, don't add more; only go multi-agent once you hit one of the three signals (open-ended, cross-domain, broad parallelism); once you go multi-agent, first choose whether someone's in charge or not; hand anything that can be pre-scripted to a workflow, and leave only what can't be scripted to an agent's own decision-making. This chain converges into the most valuable part of the whole piece: three questions you must answer. Think these three through first, then match them to an architecture — instead of stacking on technical complexity for its own sake. Try the selector below: click through your own situation and see which architecture it points to.

Question 1: How much control do you need?

If you need to clearly explain to auditors, regulators, or executives why the system made a given decision, you need predictable, traceable behavior.

High (compliance · financial transactions · safety-critical)Moderate (customer service · content · data analysis)Low (research · brainstorming · complex analysis)

→ Start with a single agent or a sequential workflow. A single agent with clear decision criteria handling loan approval is far easier to audit than a recommendation produced by three AI models collaborating.

→ Consider a hierarchical multi-agent system. The supervisor enforces business rules while the specialists handle complexity — you get both.

→ Collaborative multi-agent becomes viable. When the goal is exploring possibilities, an agent's unpredictability actually becomes a strength.

Question 2: How complex is your problem domain?

Don't over-engineer. If the work is simple and repetitive, a well-designed single agent is usually enough.

Single domain (answering product questions · processing returns · reports)Multi-domain but predictable (onboarding · compliance workflows)Complex, open-ended (strategic analysis · research · troubleshooting)

→ A single agent handles this kind of direct, repeatable task efficiently.

→ Sequential or parallel workflows. When you can draw out the process steps but each step needs different expertise, a workflow gives you structure without over-complicating things.

→ A multi-agent architecture. Worth it only when you need to break things into pieces and apply multiple methods and perspectives.

Question 3: What are your resource constraints?

Multi-agent systems use roughly 10-15x the tokens of a single agent — run the numbers against your expected call volume first.

Limited budget/tokensTime-to-market pressureLong-term strategic investment

→ A single agent, or a carefully designed parallel workflow.

→ Start with a single agent and plan an evolution path. A single agent can be deployed in weeks; a multi-agent system takes months to get right.

→ Design for modular evolution. Build the right interfaces into your first single agent, keeping the user experience consistent while leaving room to upgrade the back-end architecture later.

The guide adds a fourth question: do you need deep domain expertise? For a single domain with an established workflow, a single agent with specialized Skills is enough; only go multi-agent with specialized Skills when multiple distinct domains genuinely need to coordinate (say, legal review needing to work with financial analysis). Below is the full English-to-Chinese-mapped version of this three-question checklist — mapped item for item to the original — ready to apply directly.

Three-Question Decision Checklist (use this to evaluate which architecture fits your project)

1. What level of control do you need?
High control requirements (regulatory compliance, financial transactions, safety-critical operations) → Start with single agents or sequential workflows
Moderate control requirements (customer support, content creation, data analysis) → Consider hierarchical multi-agent systems
Low control requirements (research, brainstorming, complex analysis) → Collaborative multi-agent systems become viable

2. How complex is your problem domain?
Single domain problems (answering product questions, processing returns, generating reports) → Single agents handle these efficiently
Multi-domain but predictable problems (employee onboarding, compliance workflows, standard analysis tasks) → Sequential or parallel workflows
Complex, open-ended problems (strategic analysis, research projects, system troubleshooting) → Multi-agent architectures

3. What are your resource constraints?
Limited budget/tokens → Single agents or carefully designed parallel workflows
Time-to-market pressure → Start with single agents, plan an evolution path
Long-term strategic initiative → Design for modular evolution

4. Do you need deep domain expertise?
Single domain with established workflows → Single agent with specialized Skills
Multiple distinct domains requiring coordination → Multi-agent systems with specialized Skills

Quick-Reference Table: Best Fit by Pattern

Translating the constraints above into a clear matching table — check it directly when deciding which category a given piece of business falls into.

Quick-Reference Table: Where Each Architecture Pattern Fits Best

Single agents work best for:
- Customer service flows with clear category boundaries
- Document processing with well-defined business rules
- Code review and basic development tasks
- Routine analysis and reporting

Sequential workflows work best for:
- Multi-step approval processes
- Content production pipelines (draft → review → publish)
- Data transformation and validation
- Compliance checks against multiple criteria

Parallel workflows work best for:
- Scenarios where multiple perspectives improve quality
- Independent analyses that can run at the same time
- Cases where speed matters more than coordination overhead
- Risk assessments needing diverse viewpoints

Multi-agent systems work best for:
- Complex problems requiring expertise across multiple domains
- Research and analysis projects
- Dynamic customer interactions spanning multiple systems
- Strategic planning and decision support

Original English text (three-question checklist / quick-reference table / evolution template)

1. What level of control do you need?
High control requirements (regulatory compliance, financial transactions, safety-critical operations) → Start with single agents or sequential workflows
Moderate control requirements (customer support, content creation, data analysis) → Consider hierarchical multi-agent systems
Low control requirements (research, brainstorming, complex analysis) → Collaborative multi-agent systems become viable

2. How complex is your problem domain?
Single domain problems (answering product questions, processing returns, generating reports) → Single agents handle these efficiently
Multi-domain but predictable problems (employee onboarding, compliance workflows, standard analysis tasks) → Sequential or parallel workflows
Complex, open-ended problems (strategic analysis, research projects, system troubleshooting) → Multi-agent architectures

3. What are your resource constraints?
Limited budget/tokens → Single agents or carefully designed parallel workflows
Time-to-market pressure → Start with single agents, plan an evolution path
Long-term strategic initiative → Design for modular evolution

4. Do you need deep domain expertise?
Single domain with established workflows → Single agent with specialized Skills
Multiple distinct domains requiring coordination → Multi-agent systems with specialized Skills

Single agents work best for: customer service / document processing / code review / routine analysis and reporting
Sequential workflows work best for: multi-step approvals / content pipelines (draft → review → publish) / data transformation / compliance checking
Parallel workflows work best for: multiple perspectives / independent analyses / speed over coordination / diverse-viewpoint risk assessment
Multi-Agent systems work best for: complex problem-solving / research and analysis / dynamic multi-system interactions / strategic planning

Phase 1: Single agent for customer inquiries (proving value)
Phase 2: Routing pattern separating order status, product questions, complaints
Phase 3: Specialized agents for each category with shared context
Phase 4: Multi-agent system with inventory, payment, and shipping coordination
Phase 5: Evaluator agents for quality assurance and continuous improvement

A Real Evolution Path: One E-Commerce Platform's Five Phases

The guide closes with a real path: an e-commerce platform's five phases moving from a single agent to multi-agent. It's not building a complex system in one shot — each phase adds complexity only once the previous phase has proven measurable value.

Phase 1

A single agent handles customer inquiries, proving value first

Phase 2

Adds a routing pattern, splitting order status, product questions, and complaints

Phase 3

Gives each category its own specialized agent, sharing context

Phase 4

Upgrades to a multi-agent system, coordinating inventory, payment, and shipping

Phase 5

Adds evaluator agents for quality assurance and continuous improvement

Production systems also often evolve into hybrid architectures: parallel processing nested inside hierarchical supervision, dynamic routing built into sequential workflows, a single agent automatically triggering a multi-agent system when it hits an edge case. When business value can support the added complexity, these combinations unlock capabilities no single pattern can achieve on its own.

The bottom line stays the same: architecture evolves with need — start simple, measure everything, and only add complexity when it delivers measurable value. The best architecture today is the simplest one that meets today's requirements while leaving a path open for tomorrow. Building Effective AI Agents · Anthropic

This piece is based on Anthropic's officially published enterprise guide, "Building Effective AI Agents: Architecture Patterns and Implementation Frameworks." It's a vendor white paper with a sales angle built in — the 90.2% and 10-15x token cost figures come from Anthropic's internal research, while the 99.99%, 100x, 20x, 86%, and 20-60% figures are vendor-disclosed data from the respective customer cases, none independently verified by a third party. The executive summary section of the original PDF has an editing flaw (a sentence cuts off at "found"); the three-question checklist and quick-reference table are item-for-item translations.