Deploying the WorkingAgents AI Agent Gateway: Options and Trade-offs

The WorkingAgents AI Agent Gateway runs as a long-lived BEAM process. The default deployment model is one instance per customer – each company runs its own copy on its own server, with its own data, its own auth, its own tools. No shared tenancy, no cross-customer routing. That single architectural choice drives every deployment option below.

This article walks through the deployment paths that work for that model: bare VM with a Mix release, Docker, Kubernetes, and PaaS targets. It also describes what every option needs (TLS, environment variables, SQLite persistence, time sync) regardless of how you ship it.

What runs

The gateway is an Elixir application built with Mix. The release name in mix.exs is orchestrator:

defp releases do
  [
    orchestrator: [
      applications: [mcp: :permanent],
      include_executables_for: [:unix]
    ]
  ]
end

mix release orchestrator produces a self-contained tarball under _build/prod/rel/orchestrator/ – the BEAM, the compiled Erlang/Elixir code, the application’s static assets, and shell scripts to start, stop, and attach to the running node. The release does not require Elixir or Mix on the target host.

The release listens on a single HTTPS port (default 8443, override with the PORT env var) and serves:

Persistent state lives in SQLite files on local disk – one file per subsystem (users, access control registry, audit log, contact forms, blog store, etc.). Sqler – the project’s SQLite wrapper – owns each database.

That description matters because it constrains the deployment shape: stateful, single-node, file-backed. The gateway is not built to be replicated horizontally. The customer-per-instance model means you don’t need horizontal replication; you scale by adding more instances for more customers, not more nodes per customer.

What every deployment needs

Independent of the runtime choice, every WorkingAgents deployment requires:

Environment variables at startup

The release reads these at boot via config/runtime.exs:

Boot fails fast and loud if SECRET_KEY_BASE, COOKIE_SALT, or ACCESS_CONTROL_KEY are missing. That is intentional – a half-configured production instance would be a security incident.

TLS

The MCP HTTP transport requires HTTPS in any modern client. Three patterns work:

  1. Reverse proxy terminates TLS – Caddy, nginx, Traefik, or a cloud load balancer holds the cert and forwards plaintext over loopback to the BEAM. Easiest, most flexible.
  2. BEAM terminates TLS directly – point TLS_CERTFILE and TLS_KEYFILE at the cert and key. Works, but every cert renewal needs a graceful restart to pick up new files (or wire in :public_key-level reload).
  3. Self-signed for testing – the bundled cert. Never use this in production exposed to the public internet.

Persistent disk

SQLite files live wherever the BEAM’s working directory’s data/ subdirectory resolves. Plan for:

Time sync

The gateway uses millisecond timestamps for record IDs (Sqler’s convention). NTP must be running on the host or in the container. Without it, IDs go backward when the clock drifts, audit logs lie about ordering, and TTL-based expiry breaks.

Option A: Bare VM with Mix release + systemd

The current production deployment of this project runs this way. Simplest path, lowest moving parts, highest control.

Build

On a build host with the right Elixir / Erlang versions (asdf, mise, or matching system packages):

MIX_ENV=prod mix deps.get --only prod
MIX_ENV=prod mix release orchestrator

The tarball at _build/prod/rel/orchestrator/ is portable to any Linux host with the same libc generation as the build host. Built on Debian Bookworm, runs on Debian Bookworm or Ubuntu 24.04 cleanly; mixing glibc versions across major distros breaks.

Ship

rsync or scp the tarball to the target host. Extract to /opt/workingagents/. Drop a systemd unit:

[Unit]
Description=WorkingAgents AI Agent Gateway
After=network.target

[Service]
Type=simple
User=workingagents
WorkingDirectory=/opt/workingagents
EnvironmentFile=/etc/workingagents/env
ExecStart=/opt/workingagents/bin/orchestrator start
ExecStop=/opt/workingagents/bin/orchestrator stop
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target

/etc/workingagents/env holds the secrets. systemctl enable --now workingagents and you’re running. journalctl -u workingagents -f tails logs.

When it fits

When it doesn’t fit

Option B: Docker container

The project does not ship a top-level Dockerfile today; the existing deploy/function_node/Dockerfile is for the sandboxed Function Node runtime, a separate component. Building one for the gateway is straightforward. Shape:

# Stage 1: build the release
FROM hexpm/elixir:1.18.4-erlang-28.4-debian-bookworm-slim AS build
ENV MIX_ENV=prod
WORKDIR /app
COPY mix.exs mix.lock ./
RUN mix local.hex --force && mix local.rebar --force && mix deps.get --only prod
COPY config config
COPY lib lib
COPY asset asset
RUN mix release orchestrator

# Stage 2: runtime
FROM debian:bookworm-slim
RUN apt-get update && apt-get install -y openssl libstdc++6 libsqlite3-0 ca-certificates tzdata && rm -rf /var/lib/apt/lists/*
WORKDIR /app
COPY --from=build /app/_build/prod/rel/orchestrator ./
EXPOSE 8443
ENTRYPOINT ["/app/bin/orchestrator"]
CMD ["start"]

Notes:

Compose example

services:
  gateway:
    image: workingagents/gateway:latest
    restart: unless-stopped
    ports: ["8443:8443"]
    volumes:
      - gateway-data:/app/data
    env_file: ./env.prod

volumes:
  gateway-data:

That is the smallest viable production setup. Behind a Caddy or nginx container in front for TLS termination and you have a complete deployment.

When it fits

When it doesn’t fit

Option C: Kubernetes

Possible, often overkill. The gateway’s single-node stateful shape means a Kubernetes deployment is structurally a StatefulSet with replicas=1 and a PersistentVolumeClaim. You are using Kubernetes to manage one pod that mounts one volume – a workload its scheduler is not built to make better.

Where Kubernetes does earn its cost:

Where it doesn’t:

If you do go this route, the StatefulSet pattern looks like:

apiVersion: apps/v1
kind: StatefulSet
metadata: { name: workingagents-gateway }
spec:
  serviceName: workingagents-gateway
  replicas: 1
  selector: { matchLabels: { app: gateway } }
  template:
    spec:
      containers:
      - name: gateway
        image: workingagents/gateway:1.x
        ports: [{ containerPort: 8443 }]
        envFrom: [{ secretRef: { name: gateway-env } }]
        volumeMounts: [{ name: data, mountPath: /app/data }]
  volumeClaimTemplates:
  - metadata: { name: data }
    spec:
      accessModes: ["ReadWriteOnce"]
      resources: { requests: { storage: 50Gi } }
      storageClassName: local-ssd

Plus a Service, an Ingress with TLS, and a Secret holding the env vars. The deployment story is then kubectl apply per release.

Option D: PaaS (Fly.io, Render, Railway, etc.)

For a single-customer deploy where the customer is OK with a managed platform, a PaaS removes a lot of operational work. The shape:

The Fly Volumes trade-off is worth knowing: they are local SSD on the Fly machine, which is what SQLite wants – but they are bound to a specific Fly region. The gateway cannot fail over to another region without rsync-ing the volume first. For a single-instance deployment that’s acceptable.

When PaaS fits

When it doesn’t

Things to plan for, every option

Independent of the chosen runtime:

Decision shortcut

A rough heuristic:

The deployment shape that matters most isn’t the runtime – it’s the operational surface around it: backups, monitoring, cert renewal, and a tested upgrade path. Pick the runtime your team can operate. The gateway runs fine on any of them.