By James Aspinwall, co-written by Alfred – February 26, 2026
Most agent systems still treat execution as prompt-time behavior: the model reads instructions, improvises a plan, calls tools one step at a time, and logs each action as it goes.
That works, but it is expensive, slow, and difficult to make deterministic.
A better pattern is this:
- The agent translates business logic into code.
- The code is compiled and tested.
- The compiled function is registered in an MCP server as a callable tool.
- Later, the agent calls the MCP tool with arguments and gets results.
This shifts work from repeated reasoning tokens to reusable execution artifacts.
The Core Idea
Instead of asking an LLM to “figure out how to do the task” every time, you ask it once to generate a robust implementation, then treat that implementation as infrastructure.
Prompting becomes orchestration. Code becomes execution. MCP becomes the stable interface.
Email Campaign Example
You want to send a promotion email to N clients.
The naive approach:
- Agent reasons per recipient.
- Agent calls “send email” one-by-one.
- Agent logs one-by-one.
- Agent re-checks dedupe logic per step.
The compiled-tool approach:
-
Agent generates
send_campaign(template_id, recipients, variables, campaign_id). - Function checks idempotency: “has this template already been sent to this recipient for this campaign?”
- Function dispatches all valid sends in batch or controlled parallelism.
- Function writes a durable campaign ledger.
- Function returns summary: sent, skipped, failed, retry_queue.
Now the agent just calls one MCP tool:
campaign_send_bulk(args)
The expensive cognition was paid once. Execution is fast, repeatable, and auditable.
Why This Is Better
1) Token Cost Drops
Once logic is compiled, repeated runs need only compact arguments. You avoid re-deriving the same loop, dedupe checks, and failure handling in prompt space.
2) Latency Drops
A single tool call can do what would otherwise require many LLM turns and tool invocations.
3) Determinism Improves
Compiled code with fixed inputs gives bounded behavior. You can still have nondeterministic external systems, but your control logic is stable.
4) Safety Improves
You can gate execution through code review, static checks, integration tests, allowlists, and permissioned MCP routing.
5) Observability Improves
You can instrument one tool boundary with structured logs, metrics, idempotency keys, and audit events.
Other High-Value Examples
Invoicing and Collections
Generate and compile:
-
invoice_batch_generate(period, customer_segment) -
invoice_send_reminders(rule_set)
Behavior:
- idempotent invoice creation
- duplicate prevention
- payment-state checks
- automatic escalation policy
CRM Follow-Up Engine
Generate and compile:
-
crm_followup_plan(account_ids, policy_id) -
crm_followup_execute(plan_id)
Behavior:
- stage-aware messaging
- cooldown windows
- no-repeat template checks
- channel-specific dispatch (email, WhatsApp, SMS)
Data Quality Repair
Generate and compile:
-
repair_customer_records(dataset_id, ruleset_id)
Behavior:
- schema validation
- normalization
- conflict resolution strategy
- immutable before/after snapshots
Contract Operations
Generate and compile:
-
contract_obligation_scan(contract_batch_id) -
contract_alert_schedule(obligation_set)
Behavior:
- deterministic extraction and deadline math
- escalation timelines
- notification fanout
Agentic ETL
Generate and compile:
-
etl_pipeline_run(pipeline_id, source_snapshot_id)
Behavior:
- exactly-once semantics
- checkpointing
- retry with backoff
- reconciliation report
A Practical Lifecycle for “Code-First Agents”
- Spec stage
- Agent produces a structured execution spec (inputs, outputs, side effects, invariants).
- Code generation stage
- Agent emits implementation plus tests and typed interface.
- Verification stage
- Static checks, unit tests, integration tests, policy tests, and sandbox run.
- Registration stage
- Artifact is packaged and exposed as MCP tool with strict schema.
- Runtime stage
- Agent calls tool with arguments only.
- Tool executes deterministically.
- Observability and audit events are recorded.
- Evolution stage
- New version generated and canary-tested.
- Old version retained for rollback and reproducibility.
What Must Be In Place
This pattern is powerful only if your platform enforces boundaries:
- strict tool input schemas
- idempotency keys on write operations
- explicit side-effect declarations
- permission checks in logic layer, not just UI
- audit trails with actor, target, and outcome
- retry policies with caps and dead-letter queues
- test gates before tool publication
Without these, generated code can be faster but not safer.
Frontier Model Code Quality: Reality Check
Frontier models are now strong enough to generate useful production code, but quality varies by task shape.
They are generally strong at:
- scaffolding complete modules
- common framework usage
- serialization and API wiring
- test skeleton generation
They are inconsistent at:
- subtle state-machine invariants
- concurrency edge cases
- transactional boundaries
- security hardening details
- long-horizon refactors across many files
So the right stance is not “trust the model” or “never trust the model.”
The right stance is:
- use models to produce candidate implementations quickly
- enforce deterministic verification before promotion
- constrain runtime through MCP interfaces and policy
A Useful Quality Rubric for Generated Tool Code
Before promoting generated code to MCP tool status, score it on:
- Correctness: passes unit + integration tests with known edge cases
- Safety: permissions, input validation, side-effect controls
- Idempotency: repeat calls do not duplicate external effects
- Observability: emits structured logs/metrics/traces
- Rollback: versioned and reversible
- Cost profile: cheaper than prompt-time execution over expected volume
If a tool fails any of these, do not publish it.
Economic Intuition
Think of this as replacing repeated “reasoning compute” with reusable “execution compute.”
- Prompt-time agents pay per step, per recipient, per run.
- Compiled MCP tools pay once for implementation, then execute cheaply many times.
At low volume, prompt-only may be fine. At medium/high volume, compiled tooling usually wins on cost, speed, and reliability.
The Strategic Shift
The future agent stack is not “bigger prompts and more instructions.”
It is:
- models that synthesize operational code
- verification pipelines that prove behavior
- MCP servers that expose stable, permissioned execution interfaces
- agents that orchestrate those tools instead of re-implementing logic every run
That is how you get systems that are cheaper, faster, safer, and repeatable.
Not by asking the model to do everything every time.
By letting the model write code once, and letting infrastructure do the rest.