External agents – Claude Code, OpenAI Codex, Gemini CLI, custom agents on LangChain or CrewAI – connect to WorkingAgents through MCP endpoints. Today, the MCP transport authenticates the agent, checks permissions on each tool call, and returns results. That’s access control.
A gateway is more than access control. It manages the full session lifecycle: how many calls, how fast, how much it costs, what patterns emerge, and when to cut it off. The difference matters because external agents are autonomous. They decide what to call and when. Without session-level governance, a single agent can exhaust resources, exfiltrate data through multi-call sequences, or loop until the budget is gone.
This article lays out the complete design for an external agent gateway in WorkingAgents.
What Exists Today
| Layer | What it does | What it doesn’t do |
|---|---|---|
MCP Transport (/sse, /mcp) |
Authenticates via bearer token or session cookie. Maintains session ID in Registry. Routes JSON-RPC to handler. | No session state beyond user_id. No tracking of call history, cost, or patterns. |
Permission Guards (Permissions.*) |
Capability-based key check on every tool call. Guard clause at the function head. O(1). | Only answers “is this allowed?” Not “is this safe?” or “is this expensive?” or “is this suspicious?” |
Tool Audit (ToolAudit) |
Logs tool name, user, status, duration to SQLite. | Does not log arguments or results. Cannot correlate calls into sessions or detect patterns. |
What the Gateway Adds
The gateway inserts between the MCP transport and tool execution. It operates at two levels:
Call level – inspect and gate each individual tool call (arguments, rate limits, injection scanning).
Session level – track state across the lifetime of an MCP connection (call history, cost accumulator, pattern detection, circuit breaking).
External Agent
-> MCP Transport (auth, session ID)
-> Gateway Session Manager (per-session state)
-> Call-Level Proxy
-> PreFlight: argument guard, rate limit, injection scan
-> Permission Guard (existing)
-> Tool Execution (existing)
-> PostFlight: result scan, size limit, audit
-> Session-Level Checks
-> Budget accumulator
-> Sequence detector
-> Velocity monitor
-> Circuit breaker
-> Result returned to agent
Architecture
Session Manager
Each MCP session gets a lightweight process that tracks session state. Created when the agent connects, destroyed on disconnect or idle timeout.
defmodule AgentGateway.Session do
use GenServer
defstruct [
:session_id,
:user_id,
:username,
:connected_at,
call_count: 0,
cost_usd: 0.0,
call_history: [], # last N tool names for pattern detection
consecutive_errors: 0,
circuit_open: false,
last_call_at: nil
]
@max_history 50
@idle_timeout_ms 1_800_000 # 30 minutes
def start_link(opts) do
GenServer.start_link(__MODULE__, opts)
end
def call_tool(session_pid, name, args, user) do
GenServer.call(session_pid, {:call_tool, name, args, user}, 120_000)
end
@impl GenServer
def init(opts) do
state = %__MODULE__{
session_id: opts[:session_id],
user_id: opts[:user_id],
username: opts[:username],
connected_at: System.system_time(:millisecond)
}
{:ok, state, @idle_timeout_ms}
end
@impl GenServer
def handle_call({:call_tool, name, args, user}, _from, state) do
with :ok <- check_circuit(state),
:ok <- check_rate(state),
:ok <- check_budget(state),
{:ok, args} <- AgentGateway.PreFlight.run(state, name, args),
result <- execute_tool(name, args, user),
{:ok, result} <- AgentGateway.PostFlight.run(state, name, result) do
new_state = record_success(state, name, result)
AgentGateway.Audit.log(new_state, name, args, result, :ok)
{:reply, result, new_state, @idle_timeout_ms}
else
{:rejected, reason} = rejection ->
new_state = record_rejection(state, name, reason)
AgentGateway.Audit.log(new_state, name, args, nil, rejection)
{:reply, {:error, format_error(reason)}, new_state, @idle_timeout_ms}
end
end
@impl GenServer
def handle_info(:timeout, state) do
# Idle timeout -- session dies, resources freed
{:stop, :normal, state}
end
end
Key design choice: the session is a GenServer, not a data structure in ETS. This gives us:
- Process isolation – one session crashing doesn’t affect others
- Supervision – crashed sessions restart cleanly
- Message queue – concurrent tool calls from the same agent serialize naturally
- Idle timeout – inactive sessions self-terminate
Session Registry
Sessions are tracked in an Elixir Registry, keyed by session ID. The MCP transport already stores session IDs in McpSessionRegistry. The gateway extends this with per-session processes.
defmodule AgentGateway.SessionRegistry do
def start_session(session_id, user_id, username) do
case DynamicSupervisor.start_child(
AgentGateway.SessionSupervisor,
{AgentGateway.Session, session_id: session_id, user_id: user_id, username: username}
) do
{:ok, pid} ->
Registry.register(AgentGateway.Registry, session_id, pid)
{:ok, pid}
error -> error
end
end
def get_session(session_id) do
case Registry.lookup(AgentGateway.Registry, session_id) do
[{_, pid}] -> {:ok, pid}
[] -> {:error, :session_not_found}
end
end
def active_sessions do
Registry.select(AgentGateway.Registry, [{{:"$1", :"$2", :"$3"}, [], [{{:"$1", :"$3"}}]}])
end
end
Integration Point
One change in my_mcp_server_router.ex. The call_tool/3 function routes through the session:
Before:
defp call_tool(name, args, user) do
MyMCPServer.Manager.call_tool(name, args, user.id)
end
After:
defp call_tool(name, args, user, session_id) do
case AgentGateway.SessionRegistry.get_session(session_id) do
{:ok, session_pid} ->
AgentGateway.Session.call_tool(session_pid, name, args, user)
{:error, :session_not_found} ->
{:error, %{code: -32000, message: "Session expired"}}
end
end
The session ID is already available in both transport handlers (/sse and /mcp). It just needs to be passed through to call_tool.
Call-Level Features
Argument Guard
Scans tool arguments for injection patterns. Same design as the MCP proxy article – SQL injection, path traversal, prompt injection regex. Free-text tools (search, send message) are excluded from injection scanning to avoid false positives.
Rate Limiter
Per-session rate limiting with configurable limits per tool:
@default_rpm 60
@tool_limits %{
"knowledge_search" => 20,
"summary_request" => 5,
"agentmail_send_message" => 10,
"whatsapp_send" => 5,
"fetch_url" => 15
}
Implementation: sliding window counter in the session state. No ETS needed – the session GenServer owns its own state.
Result Size Limiting
Tool results can be arbitrarily large. A knowledge_get on a 50KB document returns the full content. The agent sends that entire result to the LLM in the next prompt, burning tokens.
The gateway truncates results above a configurable threshold and appends a notice:
[Result truncated to 8000 characters. Use knowledge_get with the document ID for the full content.]
This saves the agent (and the user) money without breaking functionality.
Argument and Result Logging
Every tool call is logged with full arguments and a result summary:
%{
session_id: "abc123",
user_id: 1769919584059,
username: "james",
tool_name: "knowledge_search",
args: %{"query" => "deployment checklist", "k" => 5},
result_summary: "5 results, top: 'Deployment Checklist' (distance: 0.23)",
status: :ok,
duration_ms: 142,
call_number: 7, # 7th call in this session
session_cost_usd: 0.04,
created_at: 1742284800000
}
This goes beyond ToolAudit which only logs name, user, status, and duration. The gateway audit captures what was asked and what came back.
Session-Level Features
Budget Accumulator
Each session tracks cumulative cost. Cost is estimated per tool call based on a configurable cost model:
| Tool category | Estimated cost per call |
|---|---|
| Embedding search (knowledge_search, blog_search) | $0.002 |
| Text search (knowledge_search_text, blog_search_text) | $0.0001 |
| Read operations (knowledge_get, nis_get_contact) | $0.0001 |
| Write operations (knowledge_add, nis_create_contact) | $0.001 |
| External calls (fetch_url, agentmail_send_message) | $0.005 |
| Expensive operations (summary_request, blog_import) | $0.01 |
When cumulative cost exceeds the session budget (configurable per user or per role), subsequent calls are rejected:
{"error": "Session budget exceeded ($5.00 limit, $5.02 spent). Start a new session or contact an administrator."}
Sequence Detector
Tracks the last N tool calls and flags suspicious patterns:
Read-then-exfiltrate:
-
knowledge_getfollowed byagentmail_send_message– reading internal docs then emailing them out -
nis_get_contactfollowed byfetch_url– reading CRM data then sending it to an external URL -
access_control_audit_logfollowed bywhatsapp_send– reading security logs then messaging them
Enumeration:
-
10+ consecutive
knowledge_searchcalls with different queries – systematic content extraction -
5+ consecutive
nis_list_contactsornis_list_companiescalls – CRM data harvesting
Privilege probing:
-
Repeated calls to tools that return
{:not_allowed, _}– agent testing what it can access
V1 logs warnings. V2 can block or require human approval for flagged sequences.
Velocity Monitor
Tracks call frequency over sliding windows. Alerts on anomalies:
- Spike detection: Agent goes from 5 calls/minute (normal) to 50 calls/minute (abnormal). Threshold: 5x the session’s rolling average.
- Sustained high rate: Agent maintains 30+ calls/minute for 5+ minutes. Normal interactive use is 2-10 calls/minute.
- Off-hours activity: Agent making calls at 3 AM when the user typically works 9-5. (Requires user timezone from profile.)
Circuit Breaker
Consecutive errors trip the circuit. The agent can’t make more calls until the circuit resets:
@error_threshold 5
@circuit_reset_ms 60_000
defp check_circuit(%{circuit_open: true, circuit_opened_at: opened_at}) do
if System.system_time(:millisecond) - opened_at > @circuit_reset_ms do
:ok # half-open: allow one call to test
else
{:rejected, :circuit_open}
end
end
defp check_circuit(%{consecutive_errors: n}) when n >= @error_threshold do
{:rejected, :circuit_open}
end
defp check_circuit(_state), do: :ok
Five consecutive errors: circuit opens. Agent gets an error response explaining the circuit is open. After 60 seconds, one call is allowed through (half-open). If it succeeds, the circuit closes. If it fails, the circuit stays open for another 60 seconds.
This prevents an agent stuck in a retry-on-failure loop from generating hundreds of failed calls.
Session Lifecycle
Agent connects (/sse or /mcp)
-> MCP transport authenticates user
-> AgentGateway.SessionRegistry.start_session(session_id, user_id, username)
-> Session GenServer starts, begins tracking state
Agent calls tools
-> Each call routes through AgentGateway.Session.call_tool/4
-> Call-level checks (argument guard, rate limit)
-> Session-level checks (budget, circuit breaker)
-> Permission check (existing guards)
-> Tool execution
-> Post-flight checks (result scan, size limit)
-> Session state updated (call count, cost, history)
-> Audit logged
Agent disconnects or goes idle
-> 30-minute idle timeout triggers :timeout
-> Session GenServer terminates normally
-> Final session summary logged:
- Total calls, total cost, duration, tools used, errors, rejections
-> Registry entry cleaned up
Admin Visibility
The gateway exposes session data through MCP tools for the admin:
gateway_sessions -- list all active sessions (user, connected_at, call_count, cost)
gateway_session_detail -- get full call history for a session
gateway_kill_session -- force-terminate a session
gateway_set_budget -- set per-user session budget
gateway_set_rate_limit -- override rate limits for a user
These would be admin-only tools gated by Permissions.Admin.
File Structure
lib/
agent_gateway/
session.ex # Per-session GenServer
session_registry.ex # Session lookup and lifecycle
session_supervisor.ex # DynamicSupervisor for sessions
pre_flight.ex # Call-level pre-checks
post_flight.ex # Call-level post-checks
argument_guard.ex # Injection scanning
result_limiter.ex # Result size truncation
budget.ex # Cost model and accumulator
sequence_detector.ex # Multi-call pattern detection
velocity_monitor.ex # Call frequency anomaly detection
circuit_breaker.ex # Consecutive error protection
audit.ex # Full argument/result logging
Implementation Phases
| Phase | What | Effort | Impact |
|---|---|---|---|
| 1 | Session GenServer + Registry + integration into router | 1 day | Foundation for everything else |
| 2 | Rate limiter + circuit breaker + argument logging | 1 day | Prevents runaway loops and budget explosions |
| 3 | Argument guard + result size limiting | Half day | Security: injection scanning, cost: token savings |
| 4 | Budget accumulator + cost model | Half day | Financial control per session |
| 5 | Sequence detector + velocity monitor | 1 day | Data exfiltration detection, anomaly alerting |
| 6 | Admin tools (MCP + REST) | Half day | Operational visibility and control |
| Total | ~4 days |
What This Doesn’t Cover
The agent’s LLM calls. The gateway governs tool access and session behavior. It doesn’t see what the agent sends to its LLM. For that, you need the LLM Gateway – a separate layer where agents point their API base URL at WorkingAgents.
Agent-side behavior modification. You can’t make Claude Code retry differently or change its reasoning. You can only control what it accesses through your tools and cut it off when something goes wrong.
Multi-session correlation. A user could open 5 sessions to circumvent per-session rate limits. V1 doesn’t correlate across sessions. V2 should add per-user aggregate limits across all active sessions.
The Complete Gateway Stack
LLM Gateway (/llm-gateway) -- proxies agent-to-LLM traffic
| sees prompts, responses, tokens, cost
|
External Agent
|
MCP Transport (/sse, /mcp) -- authenticates, manages connection
|
Agent Gateway (Session Manager) -- session lifecycle, budget, patterns
|
+-- Call-Level Proxy -- argument guard, rate limit, injection scan
|
+-- Permission Guards -- capability-based key check (existing)
|
+-- Tool Execution -- business logic (existing)
|
+-- Post-Flight -- result scan, size limit, audit
Three layers, three concerns:
- LLM Gateway – what the agent says to the model (optional, requires agent cooperation)
- Agent Gateway – what the agent does through your tools (mandatory, transparent to agent)
- Permission Guards – whether the agent is allowed to do it at all (existing, built)