The Alarm Module and Elixir's Unfair Advantage in AI Orchestration

By James Aspinwall, co-written by Alfred Pennyworth (my trusted AI) — March 3, 2026, 05:55


There is a module in our codebase called Alarm. It is 443 lines of Elixir. It schedules future actions, persists them in SQLite, recovers them after crashes, and chains them into multi-step workflows. It took a day to build and has never lost a scheduled task.

If I had built the same thing in Python, Go, or TypeScript, I would still be debugging the infrastructure around it.

That is not an exaggeration. It is the consequence of a fundamental difference in how the BEAM virtual machine manages processes compared to every other mainstream runtime. And it is the reason Elixir is quietly becoming the best language for AI agent orchestration — even though almost nobody is talking about it.

What Alarm Actually Does

Alarm is a GenServer — a long-running process managed by the BEAM — that handles scheduled task execution. An AI agent or a user says “send me a notification tomorrow at 9am.” Alarm stores that instruction in SQLite, calculates the delay, and hands it to Timer, a companion process that holds a single Process.send_after reference. When the timer fires, Timer delivers the message to the target process — Pushover for notifications, Scheduler for workflow steps, any registered GenServer — and Alarm loads the next pending task from the database.

The design is simple:

MCP Client → Alarm.set_timer("tomorrow at 9am", Pushover, {:send, %{title: "Standup"}})
           → stored in SQLite
           → Timer.set_timer(unix_timestamp, Pushover, message, id)
           → [delay elapses]
           → GenServer.cast(Pushover, {:send, %{title: "Standup"}})
           → Pushover sends HTTP POST to notification API
           → Alarm.fired(id) marks it complete
           → Alarm loads next pending task from SQLite

If Alarm crashes, the supervisor restarts it. On restart, init/1 queries SQLite for all pending alarms and reschedules the next one. Nothing is lost. No external queue. No Redis. No message broker. Just a process, a database, and the BEAM.

If the target process is down when the alarm fires, the message is delivered as a cast — it does not block or fail. When the target process restarts (via its own supervisor), it picks up from a clean state. If Alarm cannot deserialize a stored task — because a module was renamed or removed — it marks the alarm as failed with a reason, skips it, and moves to the next one.

This is not exceptional engineering. This is Tuesday in Elixir. And that is the point.

What the Same Thing Looks Like Elsewhere

Python: Celery + Redis + Worker Processes

In Python, the equivalent of Alarm requires Celery, a message broker (Redis or RabbitMQ), a results backend, a worker pool, and a beat scheduler for periodic tasks.

The architecture looks like this:

Your Code → Celery Task → Redis Broker → Worker Process → Execution
                                      → Beat Scheduler (for periodic)
                                      → Results Backend (Redis/DB)

That is four external services to schedule a notification for tomorrow morning.

Celery’s scheduled tasks (using eta or countdown) reside in worker memory until they are ready to execute, which causes memory pressure on long delays. The default prefetch behavior acknowledges tasks early — meaning if a worker crashes mid-execution, the task is gone. The broker considers it delivered. You lose work silently unless you configure acks_late=True and reject_on_worker_lost=True, options that are poorly documented and non-obvious.

Celery is powerful. It handles distributed task execution across multiple machines. But for a single-server AI orchestrator that needs to schedule future actions reliably? It is a freight train when you need a bicycle.

External dependencies: Redis (or RabbitMQ), Celery workers, Celery beat, results backend. Lines of infrastructure code: 200+ before you write business logic. Crash recovery: Manual configuration, easy to get wrong, fails silently by default.

Go: Goroutines + Custom Persistence + Manual Recovery

Go gives you goroutines — lightweight threads that are cheap to spawn (2KB each). But goroutines have no supervision. If a goroutine panics and you have not wrapped it in a recovery handler, it takes down the entire application.

Building Alarm in Go means:

// You need:
// 1. A goroutine manager that tracks running timers
// 2. A persistence layer (SQLite, Postgres)
// 3. A recovery mechanism that reloads pending tasks on startup
// 4. A way to cancel running timers by ID
// 5. Mutex locks around shared state
// 6. Graceful shutdown that persists in-flight tasks

Go’s scheduler is cooperative, not preemptive. A goroutine runs until it yields — through I/O, a channel operation, or an explicit runtime.Gosched(). A CPU-intensive task in a goroutine can starve others. The BEAM’s scheduler is preemptive — it interrupts processes after a reduction count, guaranteeing that no single process monopolizes the CPU.

Go has no equivalent of OTP. No supervision trees. No standardized process lifecycle. No built-in mechanism for “when this process dies, restart it with a clean state and re-initialize from persistent storage.” You build all of that yourself. Every Go team builds it differently. Most build it with bugs.

External dependencies: None required (Go compiles to a single binary), but you still write the supervision and recovery logic manually. Lines of infrastructure code: 400-600 for a production-grade scheduler with persistence and recovery. Crash recovery: Entirely your responsibility. No framework support.

TypeScript/Node.js: Bull + Redis + Worker Threads

Node.js is single-threaded. A scheduled task that performs CPU-intensive work blocks the event loop, freezing the entire application. The standard solutions — Bull (Redis-backed queue), Bree (worker threads), or node-cron — each come with tradeoffs.

Bull requires Redis:

Your Code → Bull Queue → Redis → Worker Thread → Execution

Bull is mature and reliable, but it is a queue system, not an orchestration primitive. It does not know about process lifecycle, does not have supervision strategies, and does not recover state from persistent storage automatically. If your Node.js process restarts, pending in-memory timers are gone. Only what was pushed to Redis survives.

Node-cron and node-schedule keep timers in memory. Process restarts lose everything. There is no built-in persistence. You add it yourself, and you add the recovery logic yourself, and you add the crash handling yourself.

External dependencies: Redis (for Bull), or accept that restarts lose state. Lines of infrastructure code: 150-300 for Bull-based approach. Crash recovery: Redis-backed queue survives, but in-flight execution state is lost on crash.

Why the BEAM Is Different

The comparison above is not about language syntax or developer ergonomics. It is about what the runtime gives you for free.

Processes Are Not Threads

A BEAM process is 0.5KB of memory. It has its own heap, its own garbage collection, and its own mailbox. When a process dies, only its memory is reclaimed — no other process is affected. The BEAM can run millions of processes concurrently on a single machine.

Go goroutines are 2KB and share memory. Python threads share memory and are constrained by the GIL. Node.js has one thread and simulates concurrency through the event loop. The BEAM has true preemptive concurrency with process isolation.

This matters for orchestration because an orchestrator manages many concurrent tasks. Each task needs its own state, its own lifecycle, and its own failure boundary. In Elixir, each task is a process. In every other language, you simulate this with varying degrees of success.

Supervision Is Built In

OTP supervision trees are the BEAM’s answer to “what happens when something goes wrong.” A supervisor watches its child processes. When one crashes, the supervisor restarts it according to a strategy:

In our application, Alarm and Pushover sit under the main supervisor with :one_for_one. If Alarm crashes, Pushover keeps running. Alarm restarts, reads its pending tasks from SQLite, and continues. No data loss. No manual intervention. No monitoring system needed to detect the crash and restart the process.

This is not a library you install. It is the runtime. Every Elixir application has supervision by default.

Preemptive Scheduling

The BEAM scheduler gives each process a fixed number of reductions (roughly, function calls) before preempting it and switching to another process. This means no single process can starve the system. A slow database query in one process does not freeze the orchestrator. A runaway loop in one task does not block other tasks from executing.

Go’s cooperative scheduler can be starved by CPU-bound goroutines. Node.js’s event loop blocks on synchronous operations. Python’s asyncio requires every coroutine to explicitly await to yield control. The BEAM enforces fairness at the VM level. You cannot write code that breaks the scheduler.

Hot Code Reloading

Elixir can replace running code without stopping the application. Deploy a new version of Alarm, and the BEAM swaps the module while existing processes continue running with their current state. Pending alarms are not interrupted. Active timers keep counting.

No other mainstream runtime does this. Go requires a restart. Python requires a restart. Node.js requires a restart. Every restart means potential lost state, interrupted tasks, and recovery logic.

What This Means for AI Orchestration

An AI orchestrator coordinates agents, tools, and scheduled actions. It needs to:

  1. Manage many concurrent tasks — agents calling tools, waiting for responses, processing results
  2. Schedule future actions — reminders, follow-ups, periodic checks, delayed notifications
  3. Survive failures — a crashed tool call should not take down the orchestrator
  4. Persist state — scheduled actions must survive restarts
  5. Chain actions — the output of one task triggers the next
  6. Enforce fairness — a slow LLM response should not block other agents

Elixir gives you 1, 3, and 6 for free from the runtime. Items 2, 4, and 5 require a small amount of code — Alarm is 443 lines — because the primitives (GenServer, supervision, message passing) are already there.

In Python, you need Celery + Redis for (2), custom recovery logic for (4), Celery chains or Airflow for (5), and asyncio discipline for (6). In Go, everything is manual. In Node.js, you need Bull + Redis for (2), custom persistence for (4), and worker threads for (6).

The Alarm module is a small proof of a large point: in Elixir, building reliable scheduled orchestration is a module-level concern. In other languages, it is an infrastructure-level concern. That is the difference between spending a day on it and spending a quarter on it.

From Alarm to Orchestration Engine

Alarm today schedules one-shot actions: send a notification at 9am, trigger a scheduler step at 10:30pm. But the pattern it embodies — persistent scheduling, crash recovery, process-targeted message delivery, and action chaining — is the foundation of a full orchestration engine.

Consider what happens when you extend the pattern:

Recurring alarms. An alarm that, when fired, reschedules itself. A follow-up reminder that repeats every week until the contact responds. The module does not need to know about recurrence — the target process handles it by calling Alarm.set_timer again in its handle_cast.

Conditional chains. Alarm fires → target process evaluates a condition → schedules the next action or stops. A sales pipeline that moves contacts through stages automatically, with each stage scheduled as an alarm that checks whether the previous condition was met.

Multi-agent coordination. Agent A completes a task and schedules an alarm to notify Agent B in 30 minutes. Agent B receives the notification, performs its work, and schedules an alarm for Agent C. The orchestration happens through time-delayed message passing between supervised processes — no central coordinator needed.

Backpressure through scheduling. Instead of executing all tasks immediately, schedule them with slight delays to prevent overwhelming downstream services. Rate-limit API calls by spacing alarms 5 seconds apart. The database acts as a natural queue.

Each of these patterns is a small extension of what Alarm already does. No new infrastructure. No new dependencies. Just more entries in the SQLite table and more processes receiving messages.

The Honest Limitation

Elixir runs on a single BEAM node. For a single-server AI orchestrator — which is what most organizations actually need — this is ideal. But if you need to distribute work across multiple machines, the picture changes.

Go compiles to a single binary that deploys anywhere. Python has mature distributed task frameworks (Celery, Airflow, Prefect). The BEAM has distributed Erlang, which allows processes on different nodes to communicate transparently, but clustering BEAM nodes is operationally more complex than deploying Go binaries behind a load balancer.

For the use case we are building — a multi-user AI orchestrator running on a single server with per-user databases, access control, and 86+ MCP tools — Elixir’s single-node strengths are exactly right. The day we need multi-node distribution, we will deal with it then. Premature distribution is a worse engineering sin than premature optimization.

The Bottom Line

The Alarm module is 443 lines. It provides persistent, crash-recoverable, chainable task scheduling with natural language time parsing. It took a day to build. It has never lost a scheduled task.

The equivalent in Python is Celery plus Redis plus recovery configuration plus monitoring. In Go, it is a custom scheduler with manual persistence, manual recovery, manual supervision, and manual concurrency control. In TypeScript, it is Bull plus Redis or accept that restarts lose state.

Elixir does not make these other languages bad. They are excellent languages for many things. But for building an AI orchestrator — a system that manages concurrent agents, schedules future actions, recovers from failures, and chains complex workflows — the BEAM gives you so much for free that the comparison is not close.

The right tool for the right job. For orchestration, the right tool was built in 1986 for telephone switches. It just happens to be exactly what AI agents need forty years later.