By James Aspinwall
The Model Context Protocol is reshaping how AI agents interact with tools, data, and services. As MCP adoption accelerates, the question for engineering leaders shifts from “should we support MCP?” to “what do we build it on?”
Most teams reach for Python or TypeScript — familiar, fast to prototype, easy to hire for. But MCP servers aren’t prototypes. They’re infrastructure. They sit between your AI layer and your business-critical systems, handling concurrent sessions from multiple agents, each with its own tool calls, context windows, and failure modes. They need to stay up.
Elixir running on the BEAM VM with OTP was designed — over 35 years of telecom engineering — for exactly this class of problem.
Processes Map Directly to MCP Sessions
Every MCP session is a stateful, long-lived conversation between a client and your server. In most runtimes, you manage this with connection pools, session stores, and careful threading. In Elixir, each session is simply a process — a lightweight, isolated unit of computation that costs roughly 2KB of memory.
A single BEAM node comfortably runs millions of these processes. Each MCP session gets its own process with its own state, its own mailbox, its own lifecycle. No shared mutable state. No locks. No thread pool tuning. The concurrency model isn’t bolted on — it’s the foundation.
When an AI agent opens an MCP session, your server spawns a process. When the agent disconnects, the process terminates and its memory is reclaimed. There is no cleanup code to write and no leaked resources to hunt down.
Crash Isolation That Actually Works
Here’s a scenario every MCP operator will face: a tool handler throws an unexpected error. Maybe a database query times out, an external API returns garbage, or a prompt injection triggers an edge case in your parser.
In a threaded runtime, that crash can poison shared state, corrupt connection pools, or bring down the entire process. You write defensive code everywhere and hope your error boundaries hold.
In Elixir, a crash is contained to the single process that failed. Every other MCP session continues unaffected. OTP supervisors — battle-tested process monitors with configurable restart strategies — detect the failure and restart the process in milliseconds. The crashed session reconnects and resumes. No operator intervention. No pager alert at 3am.
This isn’t theoretical resilience. Erlang/OTP powers WhatsApp (2 million connections per server), Discord (5 million concurrent users), and telecom switches with nine-nines uptime. The supervision patterns that keep phone networks running are the same ones watching your MCP tool handlers.
Hot Code Reload: Deploy Without Downtime
MCP servers are living systems. You’ll update tool definitions, fix handler bugs, adjust authentication logic, and add new capabilities — all while agents are actively connected and working.
Elixir supports hot code upgrades at the module level. Push new code to production, compile it, and the running system picks up the changes. Active sessions continue on the old code path until their next call, then seamlessly transition to the new version. No rolling restarts. No dropped connections. No load balancer draining.
For an MCP framework, this means you can ship a new tool handler in production without interrupting a single agent conversation. For a CTO, this means your AI integrations have the same deployment characteristics as telecom infrastructure.
Dynamic Server Management
MCP frameworks need to manage multiple server instances — different tool sets for different teams, tenant-isolated servers for enterprise customers, ephemeral servers spun up for specific workflows.
OTP’s DynamicSupervisor makes this trivial. Start a new MCP server with a function call. Stop it with another. The supervision tree handles process lifecycle, restarts on failure, and cleanup on shutdown. You don’t build orchestration logic — you configure it declaratively and let OTP handle the rest.
Need to spin up 50 isolated MCP server instances, each with its own tool registry and authentication context? That’s 50 supervised process trees, each independently managed, each independently restartable. In Elixir this is a few lines of code. In most other runtimes, it’s a Kubernetes deployment problem.
Built-In Observability
You can’t operate what you can’t see. MCP servers need deep visibility: which sessions are active, what tools are being called, where latency is accumulating, which processes are consuming resources.
The BEAM VM provides introspection that other runtimes simply don’t have:
- Process inspection: Query any running process for its state, message queue length, memory usage, and call stack — in production, without restarting anything.
-
:observer: A built-in graphical tool showing real-time process trees, message flows, memory allocation, and scheduler utilization. - Telemetry: Elixir’s telemetry library provides standardized event emission for latency tracking, throughput measurement, and custom metrics — all feeding into your existing Prometheus/Grafana/Datadog stack.
- Tracing: Enable per-process tracing in production without code changes. Follow a single MCP session through every function call, tool invocation, and message pass.
When an AI agent reports that a tool call is slow, you don’t grep through logs. You attach to the running system, inspect the specific session process, and see exactly where time is being spent.
The Mature Ecosystem
Elixir isn’t new. The BEAM VM has been in production since 1986. OTP’s supervision patterns have been refined across four decades of operating systems that cannot go down.
The Elixir ecosystem provides:
- Phoenix for HTTP/WebSocket transport — the same framework handling millions of real-time connections at scale
- Plug for composable middleware — authentication, rate limiting, and request transformation as stackable components
- Ecto for database interaction — if your MCP tools need structured data access
- Nx and Livebook for ML integration — when your MCP server needs to do more than proxy to external models
- Nerves for embedded/edge deployment — MCP servers running on IoT devices at the network edge
The standard library includes everything MCP servers need: JSON parsing, HTTP clients, cryptographic primitives, ETS for in-memory caching, and Mnesia for distributed state. No dependency sprawl. No left-pad incidents.
What This Means for Your Architecture
An MCP server framework built on Elixir/OTP gives you:
| Capability | What It Means Operationally |
|---|---|
| Process-per-session | Millions of concurrent agent sessions on a single node |
| Supervision trees | Automatic recovery from failures without operator intervention |
| Hot code reload | Deploy tool updates without dropping active sessions |
| Dynamic supervisors | Spin up/down isolated server instances at runtime |
| BEAM observability | Inspect any session’s state in production, in real time |
| Preemptive scheduling | No single tool call can starve other sessions |
| Distribution | Scale across nodes with built-in clustering, no external coordinator |
These aren’t features you implement. They’re properties of the runtime. Your team writes business logic — tool handlers, authentication, authorization — and OTP handles the operational complexity that would otherwise consume half your engineering effort.
The Honest Trade-Off
Elixir’s hiring pool is smaller than Python’s or TypeScript’s. That’s the real trade-off, and it’s worth naming directly.
But consider what you’re hiring for. MCP server infrastructure is a small, critical system — not a 200-person monolith. A small team of Elixir engineers will build and operate an MCP framework that a much larger team would struggle to match in a runtime not designed for this workload. The language is approachable (Ruby-like syntax, excellent documentation, a welcoming community), and experienced backend engineers typically become productive in weeks.
You’re also hiring against a simpler operational footprint. Less infrastructure to manage. Fewer failure modes to handle. Less glue code between your application and your orchestration layer. The total cost of ownership — engineering time plus infrastructure plus operational burden — favors the right tool for the job.
The Bottom Line
MCP servers are concurrent, stateful, long-lived, failure-prone, and operationally demanding. These are the exact properties that Erlang/OTP was invented to handle, and that Elixir makes accessible with modern syntax and tooling.
You can build MCP servers on any runtime. But on Elixir with OTP, you inherit 35 years of battle-tested solutions to problems you haven’t hit yet — and when you do hit them at 2am with agents down and customers waiting, you’ll be glad the runtime was designed for exactly that moment.