External Agent Gateway: Full Design for WorkingAgents

External agents – Claude Code, OpenAI Codex, Gemini CLI, custom agents on LangChain or CrewAI – connect to WorkingAgents through MCP endpoints. Today, the MCP transport authenticates the agent, checks permissions on each tool call, and returns results. That’s access control.

A gateway is more than access control. It manages the full session lifecycle: how many calls, how fast, how much it costs, what patterns emerge, and when to cut it off. The difference matters because external agents are autonomous. They decide what to call and when. Without session-level governance, a single agent can exhaust resources, exfiltrate data through multi-call sequences, or loop until the budget is gone.

This article lays out the complete design for an external agent gateway in WorkingAgents.

What Exists Today

Layer	What it does	What it doesn’t do
MCP Transport (`/sse`, `/mcp`)	Authenticates via bearer token or session cookie. Maintains session ID in Registry. Routes JSON-RPC to handler.	No session state beyond user_id. No tracking of call history, cost, or patterns.
Permission Guards (`Permissions.*`)	Capability-based key check on every tool call. Guard clause at the function head. O(1).	Only answers “is this allowed?” Not “is this safe?” or “is this expensive?” or “is this suspicious?”
Tool Audit (`ToolAudit`)	Logs tool name, user, status, duration to SQLite.	Does not log arguments or results. Cannot correlate calls into sessions or detect patterns.

What the Gateway Adds

The gateway inserts between the MCP transport and tool execution. It operates at two levels:

Call level – inspect and gate each individual tool call (arguments, rate limits, injection scanning).

Session level – track state across the lifetime of an MCP connection (call history, cost accumulator, pattern detection, circuit breaking).

External Agent
  -> MCP Transport (auth, session ID)
    -> Gateway Session Manager (per-session state)
      -> Call-Level Proxy
        -> PreFlight: argument guard, rate limit, injection scan
        -> Permission Guard (existing)
        -> Tool Execution (existing)
        -> PostFlight: result scan, size limit, audit
      -> Session-Level Checks
        -> Budget accumulator
        -> Sequence detector
        -> Velocity monitor
        -> Circuit breaker
    -> Result returned to agent

Architecture

Session Manager

Each MCP session gets a lightweight process that tracks session state. Created when the agent connects, destroyed on disconnect or idle timeout.

defmodule AgentGateway.Session do
  use GenServer

  defstruct [
    :session_id,
    :user_id,
    :username,
    :connected_at,
    call_count: 0,
    cost_usd: 0.0,
    call_history: [],       # last N tool names for pattern detection
    consecutive_errors: 0,
    circuit_open: false,
    last_call_at: nil
  ]

  @max_history 50
  @idle_timeout_ms 1_800_000  # 30 minutes

  def start_link(opts) do
    GenServer.start_link(__MODULE__, opts)
  end

  def call_tool(session_pid, name, args, user) do
    GenServer.call(session_pid, {:call_tool, name, args, user}, 120_000)
  end

  @impl GenServer
  def init(opts) do
    state = %__MODULE__{
      session_id: opts[:session_id],
      user_id: opts[:user_id],
      username: opts[:username],
      connected_at: System.system_time(:millisecond)
    }
    {:ok, state, @idle_timeout_ms}
  end

  @impl GenServer
  def handle_call({:call_tool, name, args, user}, _from, state) do
    with :ok <- check_circuit(state),
         :ok <- check_rate(state),
         :ok <- check_budget(state),
         {:ok, args} <- AgentGateway.PreFlight.run(state, name, args),
         result <- execute_tool(name, args, user),
         {:ok, result} <- AgentGateway.PostFlight.run(state, name, result) do
      new_state = record_success(state, name, result)
      AgentGateway.Audit.log(new_state, name, args, result, :ok)
      {:reply, result, new_state, @idle_timeout_ms}
    else
      {:rejected, reason} = rejection ->
        new_state = record_rejection(state, name, reason)
        AgentGateway.Audit.log(new_state, name, args, nil, rejection)
        {:reply, {:error, format_error(reason)}, new_state, @idle_timeout_ms}
    end
  end

  @impl GenServer
  def handle_info(:timeout, state) do
    # Idle timeout -- session dies, resources freed
    {:stop, :normal, state}
  end
end

Key design choice: the session is a GenServer, not a data structure in ETS. This gives us:

Process isolation – one session crashing doesn’t affect others
Supervision – crashed sessions restart cleanly
Message queue – concurrent tool calls from the same agent serialize naturally
Idle timeout – inactive sessions self-terminate

Session Registry

Sessions are tracked in an Elixir Registry, keyed by session ID. The MCP transport already stores session IDs in McpSessionRegistry. The gateway extends this with per-session processes.

defmodule AgentGateway.SessionRegistry do
  def start_session(session_id, user_id, username) do
    case DynamicSupervisor.start_child(
      AgentGateway.SessionSupervisor,
      {AgentGateway.Session, session_id: session_id, user_id: user_id, username: username}
    ) do
      {:ok, pid} ->
        Registry.register(AgentGateway.Registry, session_id, pid)
        {:ok, pid}
      error -> error
    end
  end

  def get_session(session_id) do
    case Registry.lookup(AgentGateway.Registry, session_id) do
      [{_, pid}] -> {:ok, pid}
      [] -> {:error, :session_not_found}
    end
  end

  def active_sessions do
    Registry.select(AgentGateway.Registry, [{{:"$1", :"$2", :"$3"}, [], [{{:"$1", :"$3"}}]}])
  end
end

Integration Point

One change in my_mcp_server_router.ex. The call_tool/3 function routes through the session:

Before:

defp call_tool(name, args, user) do
  MyMCPServer.Manager.call_tool(name, args, user.id)
end

After:

defp call_tool(name, args, user, session_id) do
  case AgentGateway.SessionRegistry.get_session(session_id) do
    {:ok, session_pid} ->
      AgentGateway.Session.call_tool(session_pid, name, args, user)
    {:error, :session_not_found} ->
      {:error, %{code: -32000, message: "Session expired"}}
  end
end

The session ID is already available in both transport handlers (/sse and /mcp). It just needs to be passed through to call_tool.

Call-Level Features

Argument Guard

Scans tool arguments for injection patterns. Same design as the MCP proxy article – SQL injection, path traversal, prompt injection regex. Free-text tools (search, send message) are excluded from injection scanning to avoid false positives.

Rate Limiter

Per-session rate limiting with configurable limits per tool:

@default_rpm 60
@tool_limits %{
  "knowledge_search" => 20,
  "summary_request" => 5,
  "agentmail_send_message" => 10,
  "whatsapp_send" => 5,
  "fetch_url" => 15
}

Implementation: sliding window counter in the session state. No ETS needed – the session GenServer owns its own state.

Result Size Limiting

Tool results can be arbitrarily large. A knowledge_get on a 50KB document returns the full content. The agent sends that entire result to the LLM in the next prompt, burning tokens.

The gateway truncates results above a configurable threshold and appends a notice:

[Result truncated to 8000 characters. Use knowledge_get with the document ID for the full content.]

This saves the agent (and the user) money without breaking functionality.

Argument and Result Logging

Every tool call is logged with full arguments and a result summary:

%{
  session_id: "abc123",
  user_id: 1769919584059,
  username: "james",
  tool_name: "knowledge_search",
  args: %{"query" => "deployment checklist", "k" => 5},
  result_summary: "5 results, top: 'Deployment Checklist' (distance: 0.23)",
  status: :ok,
  duration_ms: 142,
  call_number: 7,        # 7th call in this session
  session_cost_usd: 0.04,
  created_at: 1742284800000
}

This goes beyond ToolAudit which only logs name, user, status, and duration. The gateway audit captures what was asked and what came back.

Session-Level Features

Budget Accumulator

Each session tracks cumulative cost. Cost is estimated per tool call based on a configurable cost model:

Tool category	Estimated cost per call
Embedding search (knowledge_search, blog_search)	$0.002
Text search (knowledge_search_text, blog_search_text)	$0.0001
Read operations (knowledge_get, nis_get_contact)	$0.0001
Write operations (knowledge_add, nis_create_contact)	$0.001
External calls (fetch_url, agentmail_send_message)	$0.005
Expensive operations (summary_request, blog_import)	$0.01

When cumulative cost exceeds the session budget (configurable per user or per role), subsequent calls are rejected:

{"error": "Session budget exceeded ($5.00 limit, $5.02 spent). Start a new session or contact an administrator."}

Sequence Detector

Tracks the last N tool calls and flags suspicious patterns:

Read-then-exfiltrate:

knowledge_get followed by agentmail_send_message – reading internal docs then emailing them out
nis_get_contact followed by fetch_url – reading CRM data then sending it to an external URL
access_control_audit_log followed by whatsapp_send – reading security logs then messaging them

Enumeration:

10+ consecutive knowledge_search calls with different queries – systematic content extraction
5+ consecutive nis_list_contacts or nis_list_companies calls – CRM data harvesting

Privilege probing:

Repeated calls to tools that return {:not_allowed, _} – agent testing what it can access

V1 logs warnings. V2 can block or require human approval for flagged sequences.

Velocity Monitor

Tracks call frequency over sliding windows. Alerts on anomalies:

Spike detection: Agent goes from 5 calls/minute (normal) to 50 calls/minute (abnormal). Threshold: 5x the session’s rolling average.
Sustained high rate: Agent maintains 30+ calls/minute for 5+ minutes. Normal interactive use is 2-10 calls/minute.
Off-hours activity: Agent making calls at 3 AM when the user typically works 9-5. (Requires user timezone from profile.)

Circuit Breaker

Consecutive errors trip the circuit. The agent can’t make more calls until the circuit resets:

@error_threshold 5
@circuit_reset_ms 60_000

defp check_circuit(%{circuit_open: true, circuit_opened_at: opened_at}) do
  if System.system_time(:millisecond) - opened_at > @circuit_reset_ms do
    :ok  # half-open: allow one call to test
  else
    {:rejected, :circuit_open}
  end
end

defp check_circuit(%{consecutive_errors: n}) when n >= @error_threshold do
  {:rejected, :circuit_open}
end

defp check_circuit(_state), do: :ok

Five consecutive errors: circuit opens. Agent gets an error response explaining the circuit is open. After 60 seconds, one call is allowed through (half-open). If it succeeds, the circuit closes. If it fails, the circuit stays open for another 60 seconds.

This prevents an agent stuck in a retry-on-failure loop from generating hundreds of failed calls.

Session Lifecycle

Agent connects (/sse or /mcp)
  -> MCP transport authenticates user
  -> AgentGateway.SessionRegistry.start_session(session_id, user_id, username)
  -> Session GenServer starts, begins tracking state

Agent calls tools
  -> Each call routes through AgentGateway.Session.call_tool/4
  -> Call-level checks (argument guard, rate limit)
  -> Session-level checks (budget, circuit breaker)
  -> Permission check (existing guards)
  -> Tool execution
  -> Post-flight checks (result scan, size limit)
  -> Session state updated (call count, cost, history)
  -> Audit logged

Agent disconnects or goes idle
  -> 30-minute idle timeout triggers :timeout
  -> Session GenServer terminates normally
  -> Final session summary logged:
    - Total calls, total cost, duration, tools used, errors, rejections
  -> Registry entry cleaned up

Admin Visibility

The gateway exposes session data through MCP tools for the admin:

gateway_sessions       -- list all active sessions (user, connected_at, call_count, cost)
gateway_session_detail -- get full call history for a session
gateway_kill_session   -- force-terminate a session
gateway_set_budget     -- set per-user session budget
gateway_set_rate_limit -- override rate limits for a user

These would be admin-only tools gated by Permissions.Admin.

File Structure

lib/
  agent_gateway/
    session.ex                # Per-session GenServer
    session_registry.ex       # Session lookup and lifecycle
    session_supervisor.ex     # DynamicSupervisor for sessions
    pre_flight.ex             # Call-level pre-checks
    post_flight.ex            # Call-level post-checks
    argument_guard.ex         # Injection scanning
    result_limiter.ex         # Result size truncation
    budget.ex                 # Cost model and accumulator
    sequence_detector.ex      # Multi-call pattern detection
    velocity_monitor.ex       # Call frequency anomaly detection
    circuit_breaker.ex        # Consecutive error protection
    audit.ex                  # Full argument/result logging

Implementation Phases

Phase	What	Effort	Impact
1	Session GenServer + Registry + integration into router	1 day	Foundation for everything else
2	Rate limiter + circuit breaker + argument logging	1 day	Prevents runaway loops and budget explosions
3	Argument guard + result size limiting	Half day	Security: injection scanning, cost: token savings
4	Budget accumulator + cost model	Half day	Financial control per session
5	Sequence detector + velocity monitor	1 day	Data exfiltration detection, anomaly alerting
6	Admin tools (MCP + REST)	Half day	Operational visibility and control
Total		~4 days

What This Doesn’t Cover

The agent’s LLM calls. The gateway governs tool access and session behavior. It doesn’t see what the agent sends to its LLM. For that, you need the LLM Gateway – a separate layer where agents point their API base URL at WorkingAgents.

Agent-side behavior modification. You can’t make Claude Code retry differently or change its reasoning. You can only control what it accesses through your tools and cut it off when something goes wrong.

Multi-session correlation. A user could open 5 sessions to circumvent per-session rate limits. V1 doesn’t correlate across sessions. V2 should add per-user aggregate limits across all active sessions.

The Complete Gateway Stack

LLM Gateway (/llm-gateway)           -- proxies agent-to-LLM traffic
  |                                      sees prompts, responses, tokens, cost
  |
External Agent
  |
MCP Transport (/sse, /mcp)           -- authenticates, manages connection
  |
Agent Gateway (Session Manager)      -- session lifecycle, budget, patterns
  |
  +-- Call-Level Proxy               -- argument guard, rate limit, injection scan
  |
  +-- Permission Guards              -- capability-based key check (existing)
  |
  +-- Tool Execution                 -- business logic (existing)
  |
  +-- Post-Flight                    -- result scan, size limit, audit

Three layers, three concerns:

LLM Gateway – what the agent says to the model (optional, requires agent cooperation)
Agent Gateway – what the agent does through your tools (mandatory, transparent to agent)
Permission Guards – whether the agent is allowed to do it at all (existing, built)