Agents Should Write Code, Register It as MCP Tools, and Reuse It

By James Aspinwall, co-written by Alfred – February 26, 2026

Most agent systems still treat execution as prompt-time behavior: the model reads instructions, improvises a plan, calls tools one step at a time, and logs each action as it goes.

That works, but it is expensive, slow, and difficult to make deterministic.

A better pattern is this:

  1. The agent translates business logic into code.
  2. The code is compiled and tested.
  3. The compiled function is registered in an MCP server as a callable tool.
  4. Later, the agent calls the MCP tool with arguments and gets results.

This shifts work from repeated reasoning tokens to reusable execution artifacts.

The Core Idea

Instead of asking an LLM to “figure out how to do the task” every time, you ask it once to generate a robust implementation, then treat that implementation as infrastructure.

Prompting becomes orchestration. Code becomes execution. MCP becomes the stable interface.

Email Campaign Example

You want to send a promotion email to N clients.

The naive approach:

The compiled-tool approach:

Now the agent just calls one MCP tool:

campaign_send_bulk(args)

The expensive cognition was paid once. Execution is fast, repeatable, and auditable.

Why This Is Better

1) Token Cost Drops

Once logic is compiled, repeated runs need only compact arguments. You avoid re-deriving the same loop, dedupe checks, and failure handling in prompt space.

2) Latency Drops

A single tool call can do what would otherwise require many LLM turns and tool invocations.

3) Determinism Improves

Compiled code with fixed inputs gives bounded behavior. You can still have nondeterministic external systems, but your control logic is stable.

4) Safety Improves

You can gate execution through code review, static checks, integration tests, allowlists, and permissioned MCP routing.

5) Observability Improves

You can instrument one tool boundary with structured logs, metrics, idempotency keys, and audit events.

Other High-Value Examples

Invoicing and Collections

Generate and compile:

Behavior:

CRM Follow-Up Engine

Generate and compile:

Behavior:

Data Quality Repair

Generate and compile:

Behavior:

Contract Operations

Generate and compile:

Behavior:

Agentic ETL

Generate and compile:

Behavior:

A Practical Lifecycle for “Code-First Agents”

  1. Spec stage
  1. Code generation stage
  1. Verification stage
  1. Registration stage
  1. Runtime stage
  1. Evolution stage

What Must Be In Place

This pattern is powerful only if your platform enforces boundaries:

Without these, generated code can be faster but not safer.

Frontier Model Code Quality: Reality Check

Frontier models are now strong enough to generate useful production code, but quality varies by task shape.

They are generally strong at:

They are inconsistent at:

So the right stance is not “trust the model” or “never trust the model.”

The right stance is:

A Useful Quality Rubric for Generated Tool Code

Before promoting generated code to MCP tool status, score it on:

If a tool fails any of these, do not publish it.

Economic Intuition

Think of this as replacing repeated “reasoning compute” with reusable “execution compute.”

At low volume, prompt-only may be fine. At medium/high volume, compiled tooling usually wins on cost, speed, and reliability.

The Strategic Shift

The future agent stack is not “bigger prompts and more instructions.”

It is:

That is how you get systems that are cheaper, faster, safer, and repeatable.

Not by asking the model to do everything every time.

By letting the model write code once, and letting infrastructure do the rest.