What an AI Agent Gateway Actually Needs: Core vs Bloat

The current AI Agent Gateway codebase has grown to 122 modules in lib/, 23 permission wrappers, and 18 MCP tool handlers. A lot of that is genuinely load-bearing; a lot of it is features that accreted around the load-bearing parts and could live as separate projects. This article walks through what is what, names the minimum viable gateway you would actually want to ship, and lists the design principles and library choices worth carrying forward into a clean rewrite.

The goal is not to deprecate the current code. It is to identify the spine.

The spine

If you stripped the current project to the smallest version that still does the job a gateway exists to do, this is what survives.

Identity and tokens

That is the entirety of the authentication surface. The gateway does not invent its own crypto; it uses Plug’s primitives plus Argon2 for password storage.

Capability-based permission system

Every protected module follows the same shape: business logic in Foo, permission wrapper in Permissions.Foo, MCP handler in MCPServer.Tools.Foo. That structure is the actual product. Without it there is no governance, just an LLM hitting endpoints.

Persistence

There is no Ecto. There is no Postgres. There is one SQLite file per concern, on local disk. That choice is load-bearing: single-customer instances, file-system backups, no external database to operate.

MCP transport (server side)

This is the half-day-to-understand piece. Once you grasp the dispatch table, the entire MCP surface is composed of handler modules following the same pattern.

MCP transport (client side)

The dual-role design (server and client) is the gateway’s value proposition. An agent talks to one endpoint, the gateway talks to ten. The audit trail and permission gating happens in between.

OpenAPI bridge (REST -> MCP synthesis)

Why this belongs in the core: it is the lever that turns “ten REST APIs the customer already has” into “ten MCP tool surfaces the agent can use” without writing one Elixir adapter per API. Drop an OpenAPI spec into the bridge, set the auth credential, grant the permission key, and the agent now has typed access to that API through the same permission-gated, audit-logged pipeline as every other tool. The work that used to be a Permissions wrapper plus an MCP handler plus a Plug router for each integration collapses into a one-time registration call.

The bridge does not replace hand-written adapters when behavior matters more than the spec. The Box, Google, and WhatsApp integrations in the current codebase are hand-written because their workflows have semantics (multi-step OAuth flows, file streaming, polling) that an OpenAPI dispatcher cannot synthesize. But for the long tail of “I have a REST API, I want an MCP tool for it,” the bridge eliminates the per-API engineering cost.

REST API and web UI (the three-transport rule for core modules)

Every core protected module in the gateway – meaning every module that ships in the spine, not optional plugins – is reachable on three transports: MCP (for agents), REST (for programmatic clients that don’t speak MCP), and Web (for human operators). Picking one or two for a core module is a mistake. The discipline is to expose all three from day one because each transport serves a constituency the other two cannot.

Optional plugin modules follow a weaker rule: MCP is mandatory, REST and Web are recommended but not required. The split is intentional and is restated in the “Optional plugin modules” paragraph below.

A note on what “web” means here. The *_web.ex views are the operator console: server-rendered HTML pages for the human who runs the gateway. They are not the end-user product UI – a customer who wants a polished end-user experience builds that as a separate front-end project consuming the REST API. The gateway is both a headless API and a self-contained operator console; it is not also a marketing site or a customer-facing SaaS UI.

For each module Foo, the standard fan-out looks like this:

Transport File Calls into
MCP lib/mcp_server/tools/foo.ex Permissions.Foo
REST lib/router/foo_api.ex Permissions.Foo
Web lib/foo_web.ex Permissions.Foo (renders HTML)

All three call into the same Permissions.Foo wrapper. Permission checks live in the logic layer, never in the transport. The three transport modules are intentionally thin: parse arguments, call the wrapper, format the response in the transport’s native shape (MCP envelope, JSON, HTML).

Core REST/Web modules in the spine:

The rule is uniform: if a module appears in lib/permissions/, it must have a *_web.ex and a router/*_api.ex alongside its mcp_server/tools/ handler. Three files per module. The operator can use any transport. The agent can use any transport (but usually picks MCP). The programmatic integrator can use any transport (but usually picks REST).

Synthesized surfaces (OpenAPI bridge): the MCP tools and REST passthroughs are auto-generated from the registered specs; the operator surface is a single generic page (OpenApiBridgeWeb) for listing registered specs, inspecting synthesized tools, setting credentials, and toggling enable/disable. The operator does not get a hand-crafted page per registered API – the bridge does not have hundreds of operator pages, it has one.

Optional plugin modules (Box, Google, WhatsApp, platform-specific drivers, etc.): each ships at minimum the MCP transport. REST and Web are strongly recommended but optional, because some plugins (notably narrow platform-specific drivers) have no useful operator surface beyond what their parent module already exposes. The contract on the plugin’s permission wrapper stays identical – agents and integrators get the same gated path regardless of how many transports the plugin chose to surface.

Why this matters in practice:

The constituent surfaces (different audiences, different ergonomics) are not negotiable. Skipping the web UI to “ship MCP faster” produces a gateway no operator wants to operate. Skipping REST to “stay pure MCP” produces a gateway no DevOps team can automate. Each transport pays for itself in a different way; together they make the gateway tractable for the three audiences that actually use it.

Scheduling

This is the scheduling primitive that everything else uses. Background jobs, retries, scheduled tasks for agents, periodic health checks – they all dispatch through Alarm so the database row is the source of truth and a restart does not lose them.

Workflow orchestration

Workflows are the orchestration primitive that turns single-tool calls into multi-step business processes. An agent doesn’t write a 500-line script to “process this invoice, route to AP, post a notification, and wait for finance approval.” It instantiates a workflow template with three steps and let the executor handle the state machine, including the human-approval pause. Workflows depend on Alarm for scheduled steps, Notifier for step-ready signals, FunctionNode for deterministic transformations between steps, the LlmGateway for the limited steps that need natural-language parsing, and the permission system for who can do what at each step.

Function nodes (sandboxed compute for workflow glue)

Why this belongs in the core: real workflows are mostly deterministic plumbing between domain-specific MCP calls. Pull a contract PDF from Box. Extract a date. Compare it to today. If within 30 days, post a Slack message via the company’s MCP server. The MCP calls are the “Box pull” and “Slack post” boxes; everything between them is data shaping that has no business going through an LLM. Function nodes let workflow authors write that glue in a real programming language, deploy it as a versioned, permissioned artifact, and have the workflow executor invoke it just like any other step.

The split of work in a typical workflow:

Without function nodes, the “Most transitions” line collapses into one of two bad options: write an LLM step for every trivial transformation (expensive, slow, non-deterministic) or write Elixir in the gateway codebase for every new shape (rigid, redeploy required, no per-customer isolation). Both are losing strategies for an instance-per-customer product. Function nodes are the third option that makes workflows actually deployable.

Backend choice is a deployment decision, not an architectural one. The current codebase uses Fly.io for production isolation, but the rewrite changes the default to in-process WASM and demotes Fly.io to an optional plugin. The reasoning matters because it materially changes the deployment story:

The contract across backends is the same: invoke a function node, pass typed input, get typed output, log the invocation. The default is in-process; the escape hatches scale up when the workload demands them.

LLM Gateway (multi-provider routing)

Why this belongs in the core: workflows that involve LLM steps are most useful when the choice of model is a configuration variable, not a code dependency. A workflow template can declare “summarize step uses sonnet, extraction step uses gpt-4o-mini, embedding step uses text-embedding-3-small.” Swapping providers becomes a template edit instead of a code change. Customers who already have a contract with OpenAI but want to try Anthropic for one workflow get to do that without a redeploy.

The gateway also gives the audit layer something to hang on. Every model call goes through one path, so cost-per-token, latency, and failure rates are queryable as one table rather than scattered across N integration modules. Permissions are per-provider: llm.anthropic, llm.openai, llm.google – a customer can grant an agent access only to the providers they have a contract with.

Without the gateway, every workflow step that calls an LLM becomes its own integration. With it, the workflow knows nothing about provider APIs; it knows that one of its steps is “call the LLM with these inputs and these constraints.”

Notifications

The shape is generic. Adding email, Slack, or SMS is a new backend, not a new pipeline.

Logging and audit

Audit logging is not a feature; it is a precondition. Strip it and the gateway loses its ability to answer “what did the agent do with this customer’s data.”

Health and monitoring

Server-sent events

Documentation (searchable corpus, agent-readable, no built-in Q&A)

The deliberate design: the gateway exposes the corpus, the index, and the retrieval; the user’s own agent calls doc_search and doc_get over MCP, reads the chunks, and synthesizes the answer using whatever model it already has loaded. The gateway does not run an LLM to answer questions about its own documentation. Three reasons:

  1. No double billing. The user’s agent is already running an LLM. If the gateway also runs one to “answer” the same question, the customer pays twice for the same reasoning.
  2. No model choice imposed on the customer. The user picks the model they want their agent to use. The gateway should not pick a different one for documentation Q&A.
  3. Cleaner permission boundary. Reading documentation is a separate capability from calling an LLM. A token can have doc.read without having any llm.* keys.

Why this belongs in the core anyway: an instance-per-customer deployment is operated by people who did not write the gateway. They need to find “how do I set up a workflow,” “what does workflow_create accept as arguments,” and “why is permission attenuation behaving this way” without trawling the source. Agents have the same problem – before invoking a tool, a well-behaved agent calls doc_search to find out what the tool does and what arguments it expects.

Doc is the corpus in two shapes at once: a browseable manual surface for humans (DocWeb) and a search/retrieval surface for agents (MCP and REST). The content is asset/docs/ (architecture, internal design) plus asset/manuals/ (per-module user guides, split by access method as the project convention requires: *-web.md, *-rest.md, *-mcp.md, *-iex.md). Onboarding a new operator becomes “give them the URL.” Onboarding a new agent becomes “let it call doc_search before it starts.”

Documentation does not depend on the LlmGateway. It can ship independently, earlier in the plan.

Runtime introspection (Tidewave, permission-gated)

Why this belongs in the core: production debugging for a single-customer Elixir instance has two paths. Build a bespoke “inspect this process, inspect this Sqler database, replay this MCP request” UI (months of engineering, never matches what you actually need at 2am). Or gate the existing runtime-introspection tool behind one of the most powerful permission keys in the system, with full audit, and hand it only to operators.

The latter is what works in practice. The discipline:

The article elsewhere about “fail safely at boundaries” still applies inside Tidewave: a query that errors raises in Tidewave’s scope, gets logged with the caller’s username, and does not leave half-state on the system.

External agent invocation

Why this belongs in the core: the gateway’s value proposition is “one place to govern all the AI activity in an organization.” If the gateway cannot dispatch to external agents at all, the customer either builds their own dispatch outside the audit boundary (defeating the point) or forces every workflow through one model (losing the right-tool-for-the-job benefit). The feature has to exist. The work is the audit/permission boundary around it.

This is the feature in the gateway that carries the most responsibility per call. An authorized user with external_agent.anthropic can effectively ask a remote model to do anything the gateway’s other tools allow. The mitigations:

The reason this is in the spine and not in the optional features is that workflows depend on it for any step that needs an outside agent’s judgment – which, for many real customers, is most of the workflow.

Supervision and startup

That is the spine. It is roughly 30 modules covering the foundational concerns above. The full day-one + day-two list (the spine plus workflow, function nodes, OpenAPI bridge, LLM gateway, documentation, external agent, and the *_web.ex views the three-transport rule requires for each core module) brings the total to roughly 62 – the larger number you’ll see in the scope estimate later. Both counts describe the same architecture; the difference is whether you’re counting the foundational concerns alone or the full core-module surface with all transports.

The features

Everything in the current lib/ that is not in the list above falls into one of three buckets.

Useful but optional integrations

These are real product capabilities, but they are not the gateway. They are tools the gateway happens to expose, the way it could expose any external service through the same MCP-tool pattern:

Each is interesting. None is the gateway. In a clean rewrite they would live as separate Elixir libraries, depended on optionally, surfaced via the same Permissions.Foo + MCPServer.Tools.Foo pattern.

Optional content UI features

These are content-management features that happen to be built into the current codebase but are not the gateway. Server-rendered *_web.ex views for each protected core module (workflows, alarms, permissions, audit logs, etc.) stay in the spine – the three-transport rule applies. The pieces above are the editorial/CMS surface, which is a separate concern that can live as its own project consuming the gateway’s REST API.

Demo and developer harnesses

These belong in a separate examples/ repo, not in the production gateway. Keeping them in lib/ makes the codebase look bigger than it is.

What a clean rewrite should ship

If you started fresh today, the minimum viable gateway would have these features in this priority order:

Day-one (the spine, no shortcuts)

  1. Sqler – per-module SQLite, millisecond IDs, optimistic locking. The persistence primitive.
  2. User and SubToken – accounts, Argon2 password storage, sub-tokens with scope and per-key permissions, independent revocation.
  3. AccessControl + AccessControlled – capability-based permission registry with the standard mixin macro.
  4. Permissions.Bootstrap and Registry – deterministic permission key registration on boot.
  5. Auth pipeline – bearer token plug, ws ticket plug, auth context.
  6. REST router skeleton + web UI scaffold – the access-control API and AccessControlWeb, the access-log Plug, the shared helpers, the NavMenu, the login/registration/sub-token web flows. Every protected module ships with all three transports (MCP, REST, Web) from day one.
  7. MCP server – inbound JSON-RPC transport, prefix dispatch table, helpers, session manager.
  8. MCP client – outbound connections, connection registry, per-connection permission keys.
  9. OpenAPI bridge – parser, dispatcher, signed-assertion auth, persistence. Turns registered OpenAPI specs into MCP tools without per-API engineering.
  10. ServerLog + HttpAccessLog – audit and access logs to SQLite.
  11. Mcp.Application + Mcp.Startup – supervision tree, ordered boot steps, fail-loud startup.

Day-two (operational essentials)

  1. Alarm + Timer – scheduled events that survive restart.
  2. Workflow + WorkflowExecutor – multi-step orchestration with templates, step approvals, and persisted state. The primitive that turns single tool calls into business processes.
  3. FunctionNode + Provisioner + Registry – sandboxed code execution for deterministic workflow transitions. The “glue between MCP calls” substrate that workflows depend on.
  4. LlmGateway – multi-provider model routing with per-provider permission keys, normalized request/response, and per-call audit. The substrate workflow LLM steps call into for the limited cases that need natural-language parsing.
  5. Doc + Doc.Index + Doc.Watcher – the searchable documentation surface. FTS5 + sqlite_vec hybrid search over asset/docs/ and asset/manuals/. No built-in Q&A: the agent retrieves chunks via doc_search/doc_get and synthesizes answers with its own model.
  6. ExternalAgent + cross-platform backends – dispatch to Anthropic API, OpenAI API, or a local subprocess (claude -p, codex -p) as a workflow step type or as a permissioned MCP tool. Per-backend permission keys, rate-limited per token, full audit retention. Platform-specific backends ship as separate optional libraries.
  7. Tidewave permission gating – mount Tidewave on the main HTTPS port behind bearer auth, with tidewave.eval, tidewave.docs, tidewave.logs keys. Production introspection without a separate port or a separate auth model.
  8. Notifier + Pushover – generic notification dispatch with at least one backend.
  9. Monitor + MonitorServer – periodic health checks with persisted history.
  10. LoggerNotifier – error logs that can fan out to the notifier.
  11. SsePush – server-to-client notifications for MCP clients that need them.

Day-three (operator UX)

  1. Settings management for backend secrets (Pushover keys, OAuth credentials, TLS paths).
  2. Compile-time observability (CompileLog + AutoCompile) if you want dev ergonomics.

The web UI itself is not on this list because it is not a day-three concern – every protected module above already ships with its own web view as part of the three-transport rule. By the time day-two is done, the operator already has dashboards for permissions, audit logs, health, sub-tokens, workflows, alarms, MCP connections, and API bridge registrations.

Everything else is a feature, not the gateway. Box, Google, WhatsApp, platform-specific terminal drivers, page scrapers, NotebookLM watchers, A2A demos, content/CMS surfaces – those live as plugins.

Design principles to carry forward

Strip the bloat but keep these. They are what makes the gateway maintainable.

Functional core, imperative shell

Business logic lives in pure functional modules. Process management (GenServer state, supervision, lifecycle) lives in dedicated Server modules. A module called TaskManager has its functions; TaskManagerServer has its init/1, handle_call/3, etc. The two are not the same module.

Permissions in the logic layer, not the transport

Web, REST, and MCP all pass user permissions down. The business module checks permissions and returns {:not_allowed, reason}. Each transport translates that into its own format (403 JSON, redirect, MCP error response). Never check permissions in a Plug. Never check permissions in a tool handler.

Three transports per core protected module

Every core module in lib/permissions/ (the spine, not plugins) ships with three transport adapters: an MCP tool handler in lib/mcp_server/tools/, a REST router in lib/router/, and a web view in lib/<module>_web.ex. All three are thin and all three call the same Permissions.* wrapper. A core module that has only one or two transports is incomplete; ship the third before declaring the module done. The discipline ensures that whatever operation an agent can do via MCP, an operator can also do in the browser and a script can do via REST. Plugin modules follow a weaker version (MCP mandatory, REST/Web recommended but optional) – the trade-off documented in the three-transport-rule section above.

Modules own their data

Each protected module owns its Sqler instance, its @permission key, its process lifecycle. No shared schema across modules. No cross-module SQL joins. If two modules need to share data, one of them queries the other’s API.

Tagged tuples at API boundaries

Public functions return {:ok, value} or {:error, reason}. Internal code can raise. Rescue at the boundary, convert to tagged tuples. The “always parsable by an LLM” rule – agentic modules in particular should never return surprise shapes.

Multi-head functions and with over if/else

For any non-trivial branching, write multi-head functions or a with chain. if/else is reserved for actual boolean gates inside a single function.

Explicit OTP

@impl true on every GenServer callback. Every process in a supervision tree. No spawned processes that are not children of something.

Singleton GenServers accessed by registered name

Long-lived processes (Alarm, Pushover, Timer, Sqler) are registered by module name. Callers do GenServer.cast(Alarm, ...), not GenServer.cast(pid, ...). Internal helper processes (a per-parent Timer or Sqler) are started inside the parent’s init/1 and owned by the parent.

Soft deletes for audit trails

cancelled_at, started_at, completed_at columns. No DELETE FROM. The history is the audit trail.

Verify and refuse loudly

Boot fails with a clear message if SECRET_KEY_BASE, COOKIE_SALT, ACCESS_CONTROL_KEY are missing. Half-configured production is worse than no production. The same rule applies to enabled-but-unkeyed subsystems: if LlmGateway is enabled in config but no provider has a working API key, boot fails. If Pushover is enabled with no app token, boot fails. If OpenApiBridge has registered specs but no signing key for its signed assertions, boot fails. A gateway that boots but cannot complete the operations its config promises is the silent-failure mode that produces 2am support calls.

Library choices to carry forward

These are the dependencies that earned their place. Use them in the rewrite without re-evaluating:

Optional but proven:

Avoid in the rewrite:

Operational and lifecycle concerns

Architecture and design principles get the most attention because they shape the codebase. The concerns below shape the deployment, and a rewrite that ignores them produces a clean architecture that fails its first production incident. Each is part of the spine.

Authentication surface boundaries

Not every endpoint requires a bearer token. The boundaries:

This separation matters: a leaked session cookie has different blast radius than a leaked bearer token; the gateway audits and rate-limits both, but the operator should be able to look at any access-log row and know which auth path the request took.

Deployment topology: in-process, beside-the-process, remote

Three concentric rings of execution:

The deployment story follows the rings: a minimal instance runs the in-process ring on a single host with a local SQLite directory. Adding a “beside” ring requires Docker. Adding a “remote” ring requires network egress and the relevant credentials in Secrets. Each ring is opt-in per deployment.

Workflow template migration and compatibility

Workflow templates embed concrete references to tool names, tool versions, function-node ids, function-node versions, LLM model aliases, and permission keys. A gateway upgrade that changes any of those is a template-breaking change unless the rewrite handles it:

Observability for executor pressure and queue saturation

Rate limits and SQLite backpressure protect against blast radius, but the operator also needs to see “the system is healthy but falling behind.” The rewrite ships these signals as first-class metrics, exposed on MonitorWeb and emitted as telemetry events that can fan out to Prometheus/Grafana through a thin adapter:

These are dashboard items, not afterthoughts. The web UI’s monitor page shows them on a single screen; the telemetry feed makes them ingestible by whatever the customer’s ops platform is.

Secret management and rotation

The gateway holds and uses a wide range of secrets: bearer tokens (sub-tokens, including the operator’s tidewave.eval token), signed-assertion private keys for the OpenAPI bridge, OAuth refresh tokens for Google/Box/Microsoft integrations, per-provider LLM API keys, TLS certificates and private keys, Pushover application tokens, and customer-specific webhook secrets. Treating them as EnvironmentFile= lines in a systemd unit is fine for the first three; it does not scale to dozens.

The rewrite ships with one explicit secret store:

The Secrets module is day-one. Not day-three. Without it, the gateway’s “good security posture” claim is false.

Master key recovery and emergency rotation. The master key derived from ACCESS_CONTROL_KEY decrypts the entire Secrets store and the row-level encrypted audit tables. Losing it without a recovery path bricks the deployment. The rewrite ships three explicit mitigations:

This is the most important runbook entry in the deployment. It is documented, tested, and rehearsed before any customer goes live.

Failure semantics for workflows and alarms

Workflows orchestrate side effects. Side effects fail. The rewrite specifies the failure model up front so customers can reason about it rather than discover it in production.

Audit retention and privacy

“Full audit retention” sounds responsible until it becomes the GDPR liability the gateway introduced. The rewrite ships a retention policy that is configurable per-customer and per-table:

Backup and disaster recovery

SQLite-per-module is a load-bearing choice, so backup strategy is part of the architecture, not an operator’s afterthought.

Resource control and blast radius

Permission keys answer “who can do what?” Resource limits answer “how much, how fast, how concurrently?” The gateway ships both.

Versioning and compatibility

The gateway is updated. Tools (especially synthesized ones) and plugins must keep working across updates, or they fail customers who never asked for a breaking change.

Approximate scope of the rewrite

The day-one and day-two lists above are roughly:

The 62-module count is roughly 50% of the current 122-file project. That ratio looks unaggressive at first glance, but the three-transport rule multiplies the per-protected-module file count by 3-4 (Permissions.Foo, MCPServer.Tools.Foo, Router.FooAPI, FooWeb). The actual count of distinct architectural concerns in the spine is closer to 20, not 62. If that count can be cut further by promoting more concerns to optional libraries (the LLM gateway or the OpenAPI bridge are the obvious candidates), the spine shrinks accordingly – but doing so would force every customer who needs LLM access or REST integration to install a separate library, and the three-transport rule would still require three to four files per protected module. The trade-off is real; 20 concerns × ~3 files each is the floor.

You can ship a useful gateway in around 6,000 lines of Elixir, with one SQLite database per subsystem, one supervision tree, and one HTTPS port. Everything beyond that is a feature, not the gateway.

Requirements and plan

Two sections to turn the architecture above into a concrete project: the requirements the rewrite must satisfy, and a phased plan to ship it.

Requirements

Functional

F1. Identity and tokens. Username + Argon2 password storage. Sub-tokens issued as database rows with scope (:full, :partial) and an optional per-key permission set. Independent revocation. Token strings prefixed (st_...). F2. Capability-based permission registry. Deterministic permission keys hashed at boot. Per-user permission maps held in the access control registry. Keys never leave the server process. TTL support with lazy expiry. Attenuation support for scoped delegation. F3. Three transports per core protected module. Every core entry in lib/permissions/ ships with an MCP handler, a REST router, and a web view. All three thin, all three call the same Permissions.* wrapper. Optional plugin modules ship MCP at minimum; REST and Web are recommended but optional. F4. MCP transport. Inbound JSON-RPC over HTTP and SSE. Outbound client for consuming external MCP servers, with a connection registry, per-connection auth, and per-connection permission keys. F5. OpenAPI bridge. Register a spec, get synthesized MCP tools and REST passthroughs without writing an adapter. Signed-assertion auth for the gateway-to-upstream call. F6. LLM gateway. One canonical request shape. Provider adapters for at least Anthropic, OpenAI, and Google. Per-provider permission keys. Per-call audit (token counts, latency, cost, optional payload retention). F7. Workflow orchestration. Multi-step state machine with templates, human-in-the-loop step approvals, and per-user isolation. Workflow rows owned by their creator; cross-user access returns :not_found. admin role can hold workflow.cross_user for support cases. F8. Function nodes. Sandboxed code execution as workflow step type. Pluggable backend with priority order: in-process WASM (default, zero external deps), local Docker (universal, for richer language runtimes), and Fly.io machines as an optional vendor plugin. Versioned, permissioned, audited per-invocation. The deterministic-transformation substrate that workflows rely on between MCP calls. F9. Alarm scheduling. One-shot and recurring scheduled events that survive restart. Backed by Alarm‘s own Sqler instance; the database row is the source of truth. F10. Notifications. Generic Notifier dispatcher with at least one backend (Pushover). LoggerNotifier bridge for error fan-out. F11. Health monitoring. Periodic checks with persisted history. Exposed on all three transports per the rule: monitor_* MCP tools for agents, /api/monitor REST endpoints for scripts, MonitorWeb dashboard for operators. F12. Audit logging. Server log (BEAM Logger -> SQLite), HTTP access log (one row per request), and per-module action logs in each module’s own Sqler instance. Each is queryable via its own protected module on all three transports (HttpAccessLog, ServerLog, etc.). F13. Server-sent events. SsePush for outbound notifications to MCP clients that subscribe to changes. F14. Documentation surface. Searchable (FTS5) and semantic (sqlite_vec) over asset/docs/ and asset/manuals/. Agent-readable: doc_search returns ranked chunks, doc_get fetches by path. The gateway does not synthesize answers; the calling agent does. Doc.Watcher re-indexes on filesystem changes. F15. Runtime introspection. Tidewave mounted in production behind the same HTTPS port as the rest, gated by three permission keys (tidewave.eval, tidewave.docs, tidewave.logs), full payload audit retention. F16. External agent invocation. ExternalAgent.dispatch/3 with cross-platform core backends (Anthropic API, OpenAI API, LocalSubprocess) and per-backend permission keys. Default-deny, per-token rate-limited, full audit retention. Platform-specific adapters live in separate optional libraries.

Non-functional

N1. Single instance per customer. No multi-tenancy. Each customer’s deployment owns its data, its keys, its tokens. N2. SQLite per module. No shared database. No Ecto. Sqler’s millisecond IDs and optimistic locking convention applied uniformly. N3. Functional core, imperative shell. Business logic in pure modules. Process management in dedicated *Server modules. Tagged tuples at public boundaries. N4. Permissions in the logic layer. Transports stay thin. A permission check never lives in a Plug or an MCP handler. N5. Fail-loud boot. Boot aborts with a clear message if SECRET_KEY_BASE, COOKIE_SALT, ACCESS_CONTROL_KEY are missing. N6. Single supervision tree. The BEAM process itself is single-tree: no auxiliary daemons, no out-of-tree workers, no scripts that start their own GenServers. Everything that runs inside the gateway process starts via Mcp.Application.start/2. The gateway may orchestrate external execution environments – the in-process WASM runtime for most function nodes, Docker containers for richer function-node runtimes, Fly.io machines for the optional multi-machine backend, subprocesses for LocalSubprocess ExternalAgent calls, the Tidewave Plug for runtime introspection – but those are dispatched by in-tree supervisors that own the lifecycle (provisioning, health checks, cleanup), not by side-channel scripts. One supervisor owns each external runtime’s connection from the gateway’s side. N7. Audit by default. Every action that mutates state writes a row. Soft deletes only (cancelled_at, started_at, completed_at); no DELETE FROM. N8. Tests run without external services. CI does not need Pushover, Anthropic, OpenAI, Box, or Google. Provider adapters and tool integrations must be mockable at the boundary. N9. Deployable as a Mix release. Build once, ship a tarball, run on any Linux with matching libc. A Dockerfile is built from the release, not from mix run. N10. macOS-only modules clearly tagged. Anything that shells out to osascript or similar returns a clean error on Linux and does not break the test suite.

Plan

A phased plan. Each phase produces a runnable gateway that does strictly more than the previous one. Estimated effort assumes one engineer working steadily; double the calendar time for one engineer doing this part-time around other work.

Phase 0 – Foundation (week 1)

Phase 1 – Identity (week 2)

Phase 2 – Permissions (week 3)

Phase 3 – First three-transport module (week 4)

Phase 4 – MCP transport (inbound + outbound) (week 5)

Phase 5 – OpenAPI bridge (week 6)

MVP candidate. At the end of phase 5 the gateway is usable for deployments whose only requirements are governance, MCP/REST/Web access, and OpenAPI-fed integrations. The developer can switch to the new instance here for everyday use; phases 6 onward add operational depth.

Phase 6 – Scheduling and workflows (week 7)

Phase 7 – Function nodes (week 8)

Phase 8 – Operational essentials (week 9)

Phase 9 – Documentation (week 10)

Phase 10 – LLM gateway (week 11)

Phase 11 – External agent invocation (week 12)

Depends on LlmGateway for the API-based backends.

Phase 12 – Polish and hardening (week 13)

There is no migration phase. The current codebase has no production users; the developer is the only operator and is doing the rewrite. Phase 12 produces a ready-to-deploy gateway that boots cleanly with the developer’s own admin credentials; the developer then configures permissions, tokens, OAuth credentials, and workflow templates fresh on the new instance. The old codebase remains on disk as a reference for porting feature plugins later if needed, but is not running in parallel and is not the source of any data transferred over.

What the plan does not include

Deliberately out of scope for the rewrite, deferred or dropped entirely:

Success criteria for the rewrite

The rewrite is done when:

The discipline

The reason the current codebase ballooned past 100 modules is not bad engineering; it is that every interesting new capability got added to the same lib/ directory. The result is a single project that ships the gateway plus a dozen tools plus a blog CMS plus a contact form plus a documentation site plus the developer’s iTerm helper. Each piece is fine on its own. Together they are 122 files.

The rewrite’s discipline is: anything that is not in the day-one or day-two list becomes a separate library, depended on optionally by deployments that need it. The gateway’s mix.exs should list six or seven mandatory dependencies and a long optional list. Customer deployments pick the feature plugins they want. The core stays small enough that two engineers can hold all of it in their heads.

That, more than any specific feature choice, is what separates a product from a junk drawer.