The LLM Gateway: Why Visibility Into Agent Traffic Matters

When you deploy AI agents inside an organization, they talk to LLM providers constantly – sending system prompts, tool results, conversation history, and receiving model responses. All of that traffic flows out to Anthropic, OpenAI, or Google with zero visibility on your end. You have no record of what was sent, no way to detect when something went wrong, and no ability to enforce policy.

The LLM Gateway changes that.

What It Does

The LLM Gateway is a reverse proxy that sits between your agents and the LLM providers. Agents point their API base URL at WorkingAgents instead of the provider directly:

export ANTHROPIC_BASE_URL=https://your-server:8443/llm-gateway/anthropic
export OPENAI_BASE_URL=https://your-server:8443/llm-gateway/openai

From the agent’s perspective, nothing changes. Requests flow through transparently to the real provider. But the gateway observes every byte of every exchange and writes an audit record: provider, model, HTTP status, duration, request size, response size, SHA256 hashes, content previews, and any detected security warnings.

It supports all three major providers – Anthropic, OpenAI, and Google Gemini – with a single unified audit trail.

Why This Is the Foundation

The audit log is the prerequisite for everything else. You cannot enforce policy you cannot see. You cannot detect anomalies in traffic you never recorded. You cannot attribute costs to teams or projects without a per-request record.

This first phase establishes the observation layer. It proves the proxy works at stream level with O(1) memory overhead (no buffering the full payload), that timing data is accurate, that hashes are consistent, and that the database is receiving records reliably. Getting this right before adding enforcement is the correct order of operations.

There is also a security scanner running on every request payload. It checks for credential leakage patterns (API keys, passwords, secrets appearing in prompts) and prompt injection signatures (classic jailbreak language, system prompt overrides, special instruction tokens). Right now these generate warnings in the audit log. That is intentional – Phase 1 observes and records, it does not block.

What Needs to Come Next

The current implementation is a solid foundation but it is not production-secure. Several critical capabilities are missing:

Authentication and authorization. The gateway currently forwards any request that reaches it. There is no check that the agent is who it claims to be, no association between a request and a user or team, and no permission check before proxying. Without this, any process that can reach the gateway can use any provider key loaded on the server.

API key management. Right now, agents send their own provider keys. The more secure model is for agents to authenticate to the gateway with an internal token, and the gateway substitutes the real provider key server-side. Agent credentials never leave the organization, keys can be rotated centrally, and compromised agents cannot exfiltrate the provider API key.

Active enforcement. The injection and credential scanners log warnings today. They need to be wired to blocking logic – a request flagged for credential leakage or injection should be rejected before it reaches the provider, not just noted in a log entry.

Cost tracking and limits. The response payload contains token usage data. Extracting and aggregating this per agent, per user, and per team enables cost attribution and budget enforcement. Without it, AI spend is invisible until the provider invoice arrives.

Policy engine. Which agents are allowed to call which models? Is Claude Opus available to everyone or only to certain roles? Can agents use Google Gemini or only Anthropic? These constraints need a policy layer that the gateway can evaluate per request, tied to the access control system already in place.

Retention and search. The current audit table stores content previews. A full forensic capability requires storing complete request and response payloads (likely compressed, with configurable retention windows) and making them searchable – so that when something goes wrong, you can reconstruct exactly what the agent said and what the model replied.

The Bigger Picture

Every serious deployment of AI agents inside an organization will eventually need a gateway like this. The question is whether it gets built intentionally, as a first-class piece of infrastructure, or bolted on after an incident reveals that you had no idea what your agents were saying to the models.

Building the observation layer first – before enforcement, before policy, before cost controls – is the right call. You need real traffic data to design the right policies. You need the audit log running in production to understand the failure modes before you start blocking things.

Phase 1 is working. The foundation is there. Now it gets interesting.