The Round Trip Problem
Every time an AI agent needs to do something in the real world — read a database, call an API, check a file — it makes a tool call. The LLM generates a request, the client sends it to the server, the server executes it, the result comes back, the LLM reads the result, decides what to do next, and makes another tool call.
For a simple task like “get the weather in San Francisco,” that’s one round trip. Fine.
But real work isn’t one step. Consider “find all overdue invoices, email each customer a reminder, and log the outreach in the CRM.” That’s:
-
Call
list_invoiceswith a filter → wait for result - LLM reads the list, decides to email customer #1
-
Call
send_emailfor customer #1 → wait for result -
Call
log_crm_interactionfor customer #1 → wait for result - LLM decides to email customer #2
- Repeat steps 3-4 for each customer
- …20 customers later, you’ve made 41 API calls
Each round trip costs time and tokens. The LLM re-reads the entire conversation history on every turn. For 20 customers, that’s 41 LLM inference calls, each one paying the full context window cost. A task that should take seconds takes minutes and burns through your API budget.
This is the fundamental inefficiency of the standard MCP tool-use loop.
How MCP Works Today
The Model Context Protocol (MCP) defines a clean interface between AI agents and external tools. The current spec (2025-11-25) works like this:
Discovery — The client asks the server tools/list and gets back a catalog of available tools, each with a name, description, and JSON Schema for inputs and outputs.
Invocation — The LLM decides to call a tool. The client sends tools/call with the tool name and arguments. The server executes it and returns the result.
Loop — The LLM reads the result, decides what to do next, and potentially calls another tool. This repeats until the task is done.
LLM → Client → Server: tools/call("list_invoices", {status: "overdue"})
Server → Client → LLM: [invoice_1, invoice_2, ..., invoice_20]
LLM → Client → Server: tools/call("send_email", {to: "customer_1@..."})
Server → Client → LLM: {sent: true}
LLM → Client → Server: tools/call("log_interaction", {contact: "customer_1"})
Server → Client → LLM: {logged: true}
... repeat 19 more times ...
Each arrow is a network hop. Each LLM → is a full inference call. The protocol is correct, but chatty.
The Compound Tool Pattern
The solution isn’t a protocol change — it’s a server-side design pattern. Instead of exposing granular tools that the LLM orchestrates one at a time, you expose compound tools that execute multi-step workflows server-side in a single call.
Before: Granular Tools
[
{"name": "list_invoices", "description": "List invoices by status"},
{"name": "send_email", "description": "Send an email to a contact"},
{"name": "log_interaction", "description": "Log a CRM interaction"}
]
The LLM has to orchestrate all three, calling each one in sequence, for each customer.
After: Compound Tool
[
{
"name": "process_overdue_invoices",
"description": "Find all overdue invoices, email each customer a reminder using the standard template, and log the outreach in the CRM. Returns a summary of actions taken.",
"inputSchema": {
"type": "object",
"properties": {
"template": {"type": "string", "description": "Email template name"},
"dry_run": {"type": "boolean", "description": "If true, preview actions without executing"}
}
}
}
]
One tool call. The server handles the loop internally — queries the database, iterates over results, sends emails, logs interactions. Returns a summary. The LLM makes one inference call instead of 41.
What the Server Does Internally
def process_overdue_invoices(template, dry_run) do
invoices = Invoice.list(status: "overdue")
results = Enum.map(invoices, fn invoice ->
contact = Contact.get(invoice.contact_id)
email_result = if dry_run do
{:preview, Email.render(template, contact, invoice)}
else
Email.send(template, contact, invoice)
end
crm_result = unless dry_run do
CRM.log_interaction(contact, "sent_reminder", invoice)
end
%{contact: contact.name, invoice: invoice.id,
email: email_result, crm: crm_result}
end)
%{processed: length(results), details: results}
end
The logic that the LLM would have orchestrated across 41 round trips now executes in a single server-side function. The LLM gets back a structured summary and can reason about the results.
The Claude SDK Agent Loop
The Claude Agent SDK (and the Claude API’s tool use feature) implements the agentic loop that drives this interaction:
- Send a message to Claude with available tools
-
Claude responds with a
tool_usecontent block - Your code executes the tool and returns the result
- Claude reads the result and either calls another tool or produces a final response
-
Repeat until
stop_reasonisend_turn
import anthropic
client = anthropic.Anthropic()
tools = [
{
"name": "process_overdue_invoices",
"description": "Find overdue invoices, email reminders, log in CRM",
"input_schema": {
"type": "object",
"properties": {
"template": {"type": "string"},
"dry_run": {"type": "boolean"}
}
}
}
]
messages = [{"role": "user", "content": "Process all overdue invoice reminders"}]
# Agent loop
while True:
response = client.messages.create(
model="claude-opus-4-6",
max_tokens=4096,
tools=tools,
messages=messages
)
if response.stop_reason == "end_turn":
break
# Execute tool calls
for block in response.content:
if block.type == "tool_use":
result = execute_mcp_tool(block.name, block.input)
messages.append({"role": "assistant", "content": response.content})
messages.append({
"role": "user",
"content": [{
"type": "tool_result",
"tool_use_id": block.id,
"content": json.dumps(result)
}]
})
With compound tools, this loop typically runs 1-2 iterations instead of 40+. Claude calls process_overdue_invoices once, gets the summary, and produces its final response.
Parallel Tool Use
Claude can also call multiple tools simultaneously in a single response. If the LLM determines that two tool calls are independent, it emits both in the same turn:
{
"content": [
{"type": "tool_use", "name": "get_sales_report", "input": {"quarter": "Q1"}},
{"type": "tool_use", "name": "get_support_tickets", "input": {"status": "open"}}
]
}
The client executes both in parallel and returns both results. Two tools, one round trip. Combined with compound tools, this further reduces the total number of LLM inference calls.
Structured Outputs and Output Schemas
The latest MCP spec adds outputSchema to tool definitions. This tells the LLM exactly what shape the result will be, so it can plan its next action without guessing:
{
"name": "process_overdue_invoices",
"outputSchema": {
"type": "object",
"properties": {
"processed": {"type": "integer"},
"failed": {"type": "integer"},
"details": {
"type": "array",
"items": {
"type": "object",
"properties": {
"contact": {"type": "string"},
"invoice_id": {"type": "integer"},
"email_sent": {"type": "boolean"},
"crm_logged": {"type": "boolean"}
}
}
}
}
}
}
With strict tool use (strict: true in the Claude API), the LLM’s tool call inputs are guaranteed to match the input schema exactly. No type mismatches, no missing fields. This eliminates retry round trips caused by malformed inputs.
Design Guidelines for Compound Tools
When to Compound
- Predictable sequences — If steps always follow the same pattern (query → iterate → act → log), compound them. The LLM adds no value deciding “should I email the next customer?” — of course it should.
- Batch operations — Any “do X for each item in a list” pattern. The server iterates faster than the LLM can.
- Transactional workflows — Operations that should succeed or fail atomically. A compound tool can roll back on failure; the LLM can’t.
When NOT to Compound
- Decision-dependent sequences — If the next step depends on LLM reasoning (“should I escalate this ticket or auto-resolve it?”), keep tools granular. The LLM needs to see each result to decide.
- Novel workflows — If the combination of steps varies each time, let the LLM orchestrate. Compound tools encode fixed patterns.
- Debugging and visibility — Granular tools are easier to audit. If you need to see every step, don’t hide them inside a compound tool.
The Hybrid Approach
Expose both granular and compound tools. The LLM can use process_overdue_invoices for the common case and fall back to individual send_email + log_interaction calls when it needs fine-grained control.
[
{"name": "process_overdue_invoices", "description": "Batch: find, email, and log all overdue invoices"},
{"name": "list_invoices", "description": "List invoices by status filter"},
{"name": "send_email", "description": "Send a single email"},
{"name": "log_interaction", "description": "Log a single CRM interaction"}
]
Claude is smart enough to pick the compound tool when the task matches and the granular tools when it needs flexibility.
The Dry Run Pattern
Compound tools should support a dry_run parameter. This lets the LLM preview what will happen before committing:
-
LLM calls
process_overdue_invoices(dry_run: true)— one round trip - Server returns a preview: “Would email 20 customers, log 20 interactions”
- LLM presents the preview to the user for approval
- User approves
-
LLM calls
process_overdue_invoices(dry_run: false)— one round trip
Two round trips total, with human approval in the middle. Without the compound tool pattern, the dry run alone would be 21 round trips.
How The Orchestrator Fits
The Orchestrator is purpose-built for this pattern. Each MCP tool registered in The Orchestrator can be:
- A compound tool that executes a multi-step workflow server-side
- Permission-gated so only authorized agents can trigger batch operations
- Rate-limited to prevent runaway compound tools from processing thousands of records
- Audited with the full execution trace logged, even though the LLM only sees the summary
- Dry-run enabled with approval workflows before destructive compound operations execute
The server-side Elixir runtime is ideal for compound tools — concurrent processing with Task.async_stream, fault tolerance with supervisors, and the BEAM’s ability to handle thousands of concurrent operations without breaking a sweat.
The Numbers
For the overdue invoice example with 20 customers:
| Approach | LLM Calls | Tool Calls | Tokens (est.) | Time (est.) |
|---|---|---|---|---|
| Granular tools | 41 | 40 | ~200K | ~3 min |
| Compound tool | 2 | 1 | ~5K | ~5 sec |
| Compound + dry run | 3 | 2 | ~8K | ~8 sec |
The compound tool approach uses 97% fewer tokens and completes 36x faster. The savings scale linearly — 200 customers would be 401 LLM calls vs. 2.
Summary
MCP gives AI agents a standard way to call tools. The Claude SDK gives agents a loop to orchestrate those calls. But the real efficiency gain comes from tool design, not protocol features.
Compound tools move the orchestration logic from the LLM (expensive, slow, token-hungry) to the server (fast, cheap, deterministic). The LLM focuses on what it’s good at — understanding intent, making judgment calls, communicating with users. The server handles what it’s good at — iterating over data, calling APIs, maintaining consistency.
Design your MCP tools for the workflow, not for the individual operation. One smart tool call beats forty dumb ones.