Agents Should Write Code, Register It as MCP Tools, and Reuse It

By James Aspinwall, co-written by Alfred – February 26, 2026

Most agent systems still treat execution as prompt-time behavior: the model reads instructions, improvises a plan, calls tools one step at a time, and logs each action as it goes.

That works, but it is expensive, slow, and difficult to make deterministic.

A better pattern is this:

The agent translates business logic into code.
The code is compiled and tested.
The compiled function is registered in an MCP server as a callable tool.
Later, the agent calls the MCP tool with arguments and gets results.

This shifts work from repeated reasoning tokens to reusable execution artifacts.

The Core Idea

Instead of asking an LLM to “figure out how to do the task” every time, you ask it once to generate a robust implementation, then treat that implementation as infrastructure.

Prompting becomes orchestration. Code becomes execution. MCP becomes the stable interface.

Email Campaign Example

You want to send a promotion email to N clients.

The naive approach:

Agent reasons per recipient.
Agent calls “send email” one-by-one.
Agent logs one-by-one.
Agent re-checks dedupe logic per step.

The compiled-tool approach:

Agent generates send_campaign(template_id, recipients, variables, campaign_id).
Function checks idempotency: “has this template already been sent to this recipient for this campaign?”
Function dispatches all valid sends in batch or controlled parallelism.
Function writes a durable campaign ledger.
Function returns summary: sent, skipped, failed, retry_queue.

Now the agent just calls one MCP tool:

campaign_send_bulk(args)

The expensive cognition was paid once. Execution is fast, repeatable, and auditable.

Why This Is Better

1) Token Cost Drops

Once logic is compiled, repeated runs need only compact arguments. You avoid re-deriving the same loop, dedupe checks, and failure handling in prompt space.

2) Latency Drops

A single tool call can do what would otherwise require many LLM turns and tool invocations.

3) Determinism Improves

Compiled code with fixed inputs gives bounded behavior. You can still have nondeterministic external systems, but your control logic is stable.

4) Safety Improves

You can gate execution through code review, static checks, integration tests, allowlists, and permissioned MCP routing.

5) Observability Improves

You can instrument one tool boundary with structured logs, metrics, idempotency keys, and audit events.

Other High-Value Examples

Invoicing and Collections

Generate and compile:

invoice_batch_generate(period, customer_segment)
invoice_send_reminders(rule_set)

Behavior:

idempotent invoice creation
duplicate prevention
payment-state checks
automatic escalation policy

CRM Follow-Up Engine

Generate and compile:

crm_followup_plan(account_ids, policy_id)
crm_followup_execute(plan_id)

Behavior:

stage-aware messaging
cooldown windows
no-repeat template checks
channel-specific dispatch (email, WhatsApp, SMS)

Data Quality Repair

Generate and compile:

repair_customer_records(dataset_id, ruleset_id)

Behavior:

schema validation
normalization
conflict resolution strategy
immutable before/after snapshots

Contract Operations

Generate and compile:

contract_obligation_scan(contract_batch_id)
contract_alert_schedule(obligation_set)

Behavior:

deterministic extraction and deadline math
escalation timelines
notification fanout

Agentic ETL

Generate and compile:

etl_pipeline_run(pipeline_id, source_snapshot_id)

Behavior:

exactly-once semantics
checkpointing
retry with backoff
reconciliation report

A Practical Lifecycle for “Code-First Agents”

Spec stage

Agent produces a structured execution spec (inputs, outputs, side effects, invariants).

Code generation stage

Agent emits implementation plus tests and typed interface.

Verification stage

Static checks, unit tests, integration tests, policy tests, and sandbox run.

Registration stage

Artifact is packaged and exposed as MCP tool with strict schema.

Runtime stage

Agent calls tool with arguments only.
Tool executes deterministically.
Observability and audit events are recorded.

Evolution stage

New version generated and canary-tested.
Old version retained for rollback and reproducibility.

What Must Be In Place

This pattern is powerful only if your platform enforces boundaries:

strict tool input schemas
idempotency keys on write operations
explicit side-effect declarations
permission checks in logic layer, not just UI
audit trails with actor, target, and outcome
retry policies with caps and dead-letter queues
test gates before tool publication

Without these, generated code can be faster but not safer.

Frontier Model Code Quality: Reality Check

Frontier models are now strong enough to generate useful production code, but quality varies by task shape.

They are generally strong at:

scaffolding complete modules
common framework usage
serialization and API wiring
test skeleton generation

They are inconsistent at:

subtle state-machine invariants
concurrency edge cases
transactional boundaries
security hardening details
long-horizon refactors across many files

So the right stance is not “trust the model” or “never trust the model.”

The right stance is:

use models to produce candidate implementations quickly
enforce deterministic verification before promotion
constrain runtime through MCP interfaces and policy

A Useful Quality Rubric for Generated Tool Code

Before promoting generated code to MCP tool status, score it on:

Correctness: passes unit + integration tests with known edge cases
Safety: permissions, input validation, side-effect controls
Idempotency: repeat calls do not duplicate external effects
Observability: emits structured logs/metrics/traces
Rollback: versioned and reversible
Cost profile: cheaper than prompt-time execution over expected volume

If a tool fails any of these, do not publish it.

Economic Intuition

Think of this as replacing repeated “reasoning compute” with reusable “execution compute.”

Prompt-time agents pay per step, per recipient, per run.
Compiled MCP tools pay once for implementation, then execute cheaply many times.

At low volume, prompt-only may be fine. At medium/high volume, compiled tooling usually wins on cost, speed, and reliability.

The Strategic Shift

The future agent stack is not “bigger prompts and more instructions.”

It is:

models that synthesize operational code
verification pipelines that prove behavior
MCP servers that expose stable, permissioned execution interfaces
agents that orchestrate those tools instead of re-implementing logic every run

That is how you get systems that are cheaper, faster, safer, and repeatable.

Not by asking the model to do everything every time.

By letting the model write code once, and letting infrastructure do the rest.