LLM Gateway Implemented: Proxying Agent-to-LLM Traffic With Philter

The LLM Gateway is now implemented in WorkingAgents using Philter, a streaming HTTP proxy library for Elixir that observes request/response bodies with O(1) memory overhead regardless of payload size.

External agents point one environment variable at WorkingAgents and all their LLM traffic flows through with full observability:

# Claude Code
export ANTHROPIC_BASE_URL=https://your-server:8443/llm-gateway/anthropic

# OpenAI Codex
export OPENAI_BASE_URL=https://your-server:8443/llm-gateway/openai

# Gemini CLI
export GEMINI_API_BASE_URL=https://your-server:8443/llm-gateway/google

Why Philter

Previous articles (LLM Proxy Design, Building the Gateway) designed a custom Plug-based reverse proxy with manual streaming. The hardest problem was SSE pass-through – forwarding each token to the agent as it arrives while simultaneously accumulating the full response for audit logging, without buffering the entire response in memory.

Philter solves this out of the box. Its architecture:

  1. A lightweight observer process spawns per request via spawn_link
  2. Each body chunk is simultaneously forwarded to the client and sent to the observer
  3. The observer incrementally builds a SHA256 hash, captures the first 64KB as a preview, and tracks total size
  4. Full body accumulation is opt-in and bounded by max_payload_size
  5. A 500MB streaming response uses the same memory as a 500-byte one

This is the right tool for the job. Building the same thing from scratch would have been 2-3 days of Plug/Finch plumbing. Philter made it a few hours.

What Was Built

Three files, one dependency

lib/
  llm_gateway.ex                  # Plug router -- mounts Philter.ProxyPlug for each provider
  llm_gateway/
    handler.ex                    # Philter handler -- audit logging, security scanning
    database.ex                   # SQLite schema for audit trail

Dependency added: {:philter, "~> 0.2"} in mix.exs. Philter’s dependencies (Finch, Plug, Jason) were already in the project.

LLMGateway (Plug Router)

Three forward directives mount Philter as a reverse proxy for each LLM provider:

forward "/anthropic", to: Philter.ProxyPlug,
  init_opts: [
    upstream: "https://api.anthropic.com",
    handler: {LLMGateway.Handler, %{}},
    receive_timeout: 120_000,
    max_payload_size: 10_485_760,
    persistable_content_types: ["application/json", "text/event-stream"]
  ]

The receive_timeout is 120 seconds (LLM responses can take a while). The max_payload_size is 10MB (large code generation responses). SSE streams (text/event-stream) are included in persistable content types so the full streamed response is captured for audit.

LLMGateway.Handler

Implements Philter’s three lifecycle callbacks:

handle_request_started/2 – logs the upstream URL. This fires before the request leaves WorkingAgents’ infrastructure.

handle_response_started/2 – logs time-to-first-byte (TTFB) and HTTP status. This fires when the first response headers arrive from the LLM provider.

handle_response_finished/2 – the main audit point. When the full exchange completes, the handler:

  1. Extracts provider and model from the request payload preview
  2. Scans the request preview for security issues:
    • Credentials leaked in system prompts (API keys, passwords, secrets)
    • Injection patterns in tool results (the highest-risk vector)
  3. Logs the complete exchange to SQLite with:
    • SHA256 hashes of both request and response (tamper detection)
    • Byte counts and timing data
    • Content previews (first 64KB, truncated to 4KB for storage)
    • Security warnings
    • Provider, model, status, duration

LLMGateway.Database

SQLite schema for the audit trail:

CREATE TABLE llm_gateway_audit (
  id INTEGER PRIMARY KEY,
  provider TEXT NOT NULL,
  model TEXT NOT NULL DEFAULT 'unknown',
  status INTEGER NOT NULL DEFAULT 0,
  duration_ms INTEGER NOT NULL DEFAULT 0,
  request_size INTEGER NOT NULL DEFAULT 0,
  response_size INTEGER NOT NULL DEFAULT 0,
  request_hash TEXT NOT NULL DEFAULT '',
  response_hash TEXT NOT NULL DEFAULT '',
  request_preview TEXT NOT NULL DEFAULT '',
  response_preview TEXT NOT NULL DEFAULT '',
  warnings TEXT NOT NULL DEFAULT '[]',
  created_at INTEGER NOT NULL
)

Indexed on provider, model, and created_at for fast querying.

Integration Points

Supervision Tree (application.ex)

Two children added:

{Finch, name: LLMGateway.Finch}      # HTTP connection pool for upstream requests
{Sqler, name: "llm_gateway", register: :llm_gateway_db}   # Audit database

Database schema initialized on startup via LLMGateway.Database.setup_database/1.

Router (my_mcp_server_router.ex)

One line:

forward "/llm-gateway", to: LLMGateway

The gateway is behind the router’s existing require_authentication plug, so agents must provide a bearer token or session cookie.

Configuration (config.exs)

config :philter,
  finch_name: LLMGateway.Finch,
  receive_timeout: 120_000,
  max_payload_size: 10_485_760,
  persistable_content_types: ["application/json", "text/event-stream"]

How It Works End-to-End

Claude Code (ANTHROPIC_BASE_URL=https://server:8443/llm-gateway/anthropic)
  |
  | POST /llm-gateway/anthropic/v1/messages
  | Headers: x-api-key: sk-ant-..., content-type: application/json
  | Body: {model, system, messages, tools, stream: true}
  |
  v
WorkingAgents Router
  -> require_authentication (bearer token)
  -> forward /llm-gateway -> LLMGateway Plug
    -> forward /anthropic -> Philter.ProxyPlug
      -> Handler.handle_request_started (log, timestamp)
      -> Observer spawns (spawn_link, tracks body chunks)
      -> Request body streamed to observer + forwarded to api.anthropic.com
      -> Response headers arrive
      -> Handler.handle_response_started (log TTFB, status)
      -> Response chunks streamed to observer + forwarded to Claude Code
      -> Stream completes
      -> Observer finalized (SHA256 hash, size, preview, timing)
      -> Handler.handle_response_finished
        -> Extract provider/model from request preview
        -> Scan for credentials and injection patterns
        -> Write audit record to SQLite
  |
  v
Claude Code receives response (identical to direct API call)

The agent sees no difference. Same API format, same SSE streaming, same headers, same timing (minus a few milliseconds of proxy overhead). The only observable change is the base URL.

What the Audit Trail Captures

Every agent-to-LLM exchange now produces a record:

provider:         anthropic
model:            claude-sonnet-4-5-20250929
status:           200
duration:         3421ms
request_size:     2847 bytes
response_size:    14832 bytes
request_hash:     a1b2c3d4e5f6... (SHA256)
response_hash:    f7e8d9c0b1a2... (SHA256)
request_preview:  {"model":"claude-sonnet-4-5","messages":[{"role":"user",...
response_preview: {"id":"msg_01X...","content":[{"type":"text","text":"...
warnings:         []

If the request contains a credential in the system prompt:

warnings: ["credential_in_payload"]

If a tool result in the conversation contains injection patterns:

warnings: ["injection_pattern_detected"]

What This Completes

The three-layer security architecture from the proxy articles is now:

Layer What It Sees Status
LLM Gateway (Philter) System prompts, conversations, tool definitions, model responses, token usage Built
MCP Proxy Tool call arguments, tool results, call sequences Designed, not built
Permission Guards Whether the user can call the tool at all Built

The LLM Gateway is the first layer to ship. It provides the visibility that was previously impossible – what agents think, what they say to the model, and what the model says back. Combined with the existing permission system that controls what agents can do through MCP tools, WorkingAgents now governs both sides of the agent’s behavior.

Sources: