TrueFoundry: The AI Infrastructure Platform Built for the Agent Era

By James Aspinwall

The AI stack is splitting into two camps. On one side, managed services like AWS Bedrock and Azure AI that trade control for convenience. On the other, raw Kubernetes deployments that give you everything but make you build the plumbing yourself. TrueFoundry occupies the space between — a platform that runs on your infrastructure, with your data, under your control, but handles the orchestration, security, and governance that nobody wants to build from scratch.

What makes it worth paying attention to: it’s one of the first platforms to treat MCP (Model Context Protocol) as a first-class citizen, not an afterthought.

What TrueFoundry Is

TrueFoundry is a Kubernetes-native platform for deploying, managing, and governing AI workloads. Three gateways form the core:

AI Gateway — a unified proxy to 250+ LLMs (OpenAI, Claude, Gemini, Mistral, self-hosted models) through a single OpenAI-compatible API. Sub-3ms internal latency. Route requests by cost, latency, or availability. Fail over automatically when a provider goes down.

Agent Gateway — a control plane for agentic AI workflows. Multi-step execution with retries, timeouts, and fallbacks. Framework-agnostic — works with LangChain, CrewAI, AutoGen, or your own implementation.

MCP Gateway — an enterprise hub for Model Context Protocol. Centralizes tool access, manages OAuth tokens per user, enforces permissions, and provides audit trails for every tool invocation.

Beyond the gateways: model serving (vLLM, SGLang, TRT-LLM), fine-tuning pipelines, prompt management with versioning, guardrails for content safety and PII, and Cognita — their open-source RAG framework.

Founded in 2021 by former Facebook engineers, backed by Intel Capital ($19M Series A), processing 10B+ requests monthly across their customer base.

The MCP Gateway: Why It Matters

MCP is a protocol that standardizes how AI agents connect to external tools and data sources. Without it, every agent-to-tool integration is a custom build. With it, a single protocol handles the connection, authentication, and data exchange.

TrueFoundry’s MCP Gateway solves the enterprise problems that the raw protocol doesn’t address.

The N×M Problem

Without a gateway, every agent needs a direct connection to every tool. Five agents and ten tools means fifty integration points, each with its own credentials, error handling, and access controls. The gateway reduces this to a hub-and-spoke model — agents connect to the gateway, the gateway connects to tools.

How It Works

Agent (Claude, Cursor, custom)
    → MCP Gateway (auth, routing, guardrails)
        → MCP Server A (GitHub)
        → MCP Server B (Slack)
        → MCP Server C (internal database)

The gateway maintains a centralized registry of MCP servers — both public servers and your own self-hosted ones. Each server registers its authentication requirements (OAuth2, API key, static headers). The gateway handles token management per user across all servers.

Virtual MCP Servers

This is the feature that makes it enterprise-ready. A Virtual MCP Server combines tools from multiple sources into a curated subset with its own permission boundary.

Example: create a “Sales Team” virtual server that exposes read-only access to the CRM, document generation in PandaDoc, and search across the knowledge base — but not database writes, deployment tools, or admin functions. The sales team’s AI agents see only the tools they’re authorized to use.

Token Management

Two token types simplify access:

Personal Access Token (PAT) — one token per user, grants access to all their authorized tools. No juggling separate credentials per service.
Virtual Account Token (VAT) — application-level access to specific tool subsets. For service accounts and automated workflows.

A single PAT replaces the scattered API keys, OAuth tokens, and service account credentials that typically accumulate across an organization.

MCP Guardrails

Guardrails execute at three points:

Pre-execution — validate inputs before a tool is called. Block suspicious patterns like SQL injection in database queries or path traversal in file operations.

Real-time — monitor execution and require human approval for high-risk operations. “The agent wants to delete a production database table — approve or deny?”

Post-execution — inspect outputs for sensitive data before returning results to the agent. Redact PII, mask credentials, filter confidential information.

Integration

Connect from any MCP client — Claude Code, Cursor, ChatGPT, or your own agents:

from fastmcp import Client
from fastmcp.client.transports import StreamableHttpTransport

transport = StreamableHttpTransport(
    url="https://gateway.example.com/mcp/sales-team/crm/server",
    auth="Bearer your-pat-token"
)

async with Client(transport=transport) as client:
    tools = await client.list_tools()
    result = await client.call_tool(
        name="search_contacts",
        arguments={"query": "Acme Corp"}
    )

One endpoint, one token, access to every tool the user is authorized for.

Security

Security is where TrueFoundry differentiates from lighter-weight alternatives. Three layers.

Compliance

SOC 2 Type 2 — audited controls over security, availability, and confidentiality across an extended evaluation period
HIPAA — compliant for protected health information workloads
GDPR — European data protection compliance

Data Residency

TrueFoundry deploys inside your environment. Your VPC, your on-prem data center, your air-gapped network. Data and compute never leave your infrastructure. The control plane can be TrueFoundry-hosted or self-hosted — your choice.

This is the fundamental difference from managed services. With Bedrock, your data flows through AWS. With TrueFoundry, your data stays where it is. The platform orchestrates workloads without extracting data.

Access Control

Four-layer authentication for the MCP Gateway:

Gateway authentication — TrueFoundry API keys or tokens from your identity provider (Okta, Azure AD, Google Workspace)
Gateway-level authorization — MCP Server Groups define which teams can access which tools
External service authorization — OAuth2 flows managed per user, per service
Custom headers — additional authentication for services that need it

Beyond the gateway:

RBAC and attribute-based access control across the platform
Per-user, per-service, and per-endpoint rate limiting
Cost-based and token-based quota enforcement
Full audit trails for every agent decision and action

Content Safety

Automated guardrails operate in two modes — validate (reject violations) or mutate (modify to comply):

Guardrail	What It Does
Prompt injection prevention	Blocks “ignore previous instructions” and similar attacks
PII detection	Detects and redacts 20+ PII categories (SSNs, credit cards, emails, phones, addresses)
Content safety	Hate, self-harm, sexual, and violence detection with configurable thresholds
Topic filtering	Block specific topics: medical advice, legal counsel, profanity
Custom rules	Python-based rules for organization-specific policies

Guardrails integrate with Azure Content Safety, OpenAI Moderation, and AWS Guardrails as backends. Mix and match based on your requirements.

How It Compares

vs. AWS Bedrock

Bedrock is convenient if you’re already all-in on AWS and only need the models Bedrock supports. TrueFoundry wins on:

Cloud agnostic — deploy on AWS, GCP, Azure, or on-prem. No lock-in.
Any model — deploy any open-source model, not just Bedrock’s curated list.
MCP native — Bedrock has no MCP gateway. You’d build it yourself.
Cost visibility — token-level attribution by user, team, and geography. Bedrock gives you a bill; TrueFoundry tells you who spent what and why.

Choose Bedrock if you need zero infrastructure management and AWS is your only cloud. Choose TrueFoundry if you need multi-cloud, custom models, or MCP.

vs. Vercel AI SDK

Vercel AI is frontend-first — great for building chat UIs that call third-party LLM APIs. TrueFoundry is backend-first — full infrastructure control.

Self-hosted models — TrueFoundry supports deploying your own models on your GPUs. Vercel depends entirely on external providers.
Guardrails — TrueFoundry has built-in PII detection, content safety, and prompt injection prevention. Vercel has minimal guardrails.
Cost model — TrueFoundry reports 30-70% cost reduction through smart routing and caching. Vercel’s serverless pricing can surprise you at scale.

Choose Vercel if you’re building a frontend app that calls ChatGPT. Choose TrueFoundry if you’re building infrastructure that serves models.

vs. LangServe

LangServe deploys LangChain chains as APIs. That’s its entire scope. TrueFoundry provides the infrastructure underneath — model serving, gateway routing, RBAC, observability, and the MCP layer that LangServe doesn’t touch.

They’re complementary. You can deploy a LangChain agent on TrueFoundry and get governance, quotas, and audit trails for free. LangSmith handles observability within LangChain; TrueFoundry handles it across your entire AI stack.

Advantages

Speed. Sub-3ms gateway latency, 350+ requests per second on a single vCPU, horizontally scalable. The gateway doesn’t become your bottleneck.

Cost control. Token-level usage attribution lets you see exactly which team, user, or customer is consuming what. Set budgets per team or per model. Smart routing sends cheap queries to cheaper models and complex queries to capable ones. Semantic caching avoids re-running identical queries.

Framework freedom. No lock-in to a specific agent framework. Use LangChain today, switch to CrewAI tomorrow, build your own next quarter. The gateway doesn’t care what’s behind the API call.

Observability. Aggregate dashboards for requests, tokens, cost, and latency — broken down by model, team, user, or environment. Request-level inspection shows the full prompt, response, routing decision, and guardrail evaluation. OpenTelemetry traces for distributed debugging.

Progressive adoption. Start with the AI Gateway as a proxy to external LLMs. Add the MCP Gateway when you need tool integration. Deploy self-hosted models when you need data privacy or cost reduction. Each layer adds value without requiring you to adopt the whole platform at once.

Pricing

Tier	Price	Requests/mo	MCP Servers	Notable
Developer	Free	50K	5	3 users, basic caching, community support
Pro	$499/mo	1M	20	RBAC, self-hosted models, semantic caching, alerts
Pro Plus	$2,999/mo	1M	50	Virtual MCPs, SSO, custom guardrails, audit logs
Enterprise	Custom	10M+	Unlimited	VPC/air-gapped deployment, multi-region, SLA

The free tier is genuinely usable for development and small projects. The jump to Pro makes sense when you need RBAC or self-hosted models. Enterprise is for organizations that need the platform inside their own infrastructure.

Getting Started

Sign up at truefoundry.com — the free tier requires no credit card.
Connect a model. Add your OpenAI or Anthropic API key. The AI Gateway immediately gives you a unified endpoint with fallback routing.
Try the playground. Send prompts, test different models, compare latency and cost.
Add an MCP server. Connect a tool (GitHub, Slack, a database) through the MCP Gateway. Test it in the Agent Playground.
Deploy an agent. Use their Python or TypeScript SDK to deploy an agent that calls models through the AI Gateway and tools through the MCP Gateway.
Add guardrails. Enable PII detection and prompt injection prevention. Test with adversarial inputs.

For self-hosted deployment, install the tfy-agent on your Kubernetes cluster and connect it to the control plane. TrueFoundry handles orchestration; your cluster handles compute.

When TrueFoundry Is the Right Choice

You’re building AI agents that need to call internal tools. The MCP Gateway solves the hardest part — secure, governed, auditable tool access. Without it, you’re building custom integrations and managing credentials yourself.

You need data to stay in your environment. Regulated industries, sensitive data, government contracts — anywhere data residency matters. TrueFoundry deploys inside your perimeter.

You’re running multiple models across multiple providers. The AI Gateway normalizes the interface, handles failover, and gives you cost visibility. One API to rule them all.

You want guardrails without building them. PII detection, prompt injection prevention, content safety, and custom rules — all configurable, all enforced at the gateway level before your application ever sees the data.

If you’re building a simple chatbot that calls one model with no tools, TrueFoundry is overkill. Use the API directly. But the moment your AI needs to interact with your systems — read from databases, call internal services, manage documents, access customer data — the governance layer isn’t optional. TrueFoundry provides that layer without making you build it.