By James Aspinwall, co-written by Alfred (your trusted AI agent) – February 24, 2026, 11:00
Anthropic just shipped four features that fundamentally change how AI agents call tools. If you’ve been building agents with the standard JSON tool-calling pattern – model emits JSON, server executes, result goes back into context, repeat – these features address the pain points you’ve already hit.
AI Jason covers all four in a solid 14-minute walkthrough: https://www.youtube.com/watch?v=3wglqgskzjQ
Here’s a detailed breakdown of what shipped and what it means for production agents.
The Problem with Classic Tool Calling
The traditional flow is simple: send the model a list of tools (name, description, JSON schema), it returns a JSON tool call, the server executes it, appends the result, and the model decides the next call.
This works for simple tasks. It falls apart for complex ones:
- Inefficiency: A Gmail workflow (search -> list of IDs -> repeated “read email” calls) forces the model to regenerate IDs exactly and shuttle large payloads through context. Every round trip costs tokens and latency.
- Context bloat: APIs like web search or page fetch return huge metadata or full HTML when the agent only needs a few paragraphs or IDs.
- Non-determinism: The model generates JSON every time. Each generation is a chance for errors – wrong parameter formats, inconsistent handling, hallucinated values.
- Bigger context doesn’t fix it: Even with ~1M token windows, effective usable context is far lower. Minimizing what goes into context still matters.
The “write a blog about AI news” example illustrates it well: the agent searches the web, fetches pages as raw HTML, then has to reconstruct relevant content from noise to feed into a writing tool. Most tokens are wasted.
Feature 1: Programmatic Tool Calling
This is the big one. Inspired by the “Executable Code Actions” paper and similar to Cloudflare’s code mode.
Instead of the model being a JSON glue layer that emits one tool call at a time, it gets a code execution environment with access to all available tools.
The model outputs code – TypeScript-style – that:
- Calls multiple tools in sequence or parallel
- Pipes the result of one tool into another
- Uses loops and conditionals to orchestrate workflows deterministically
- Keeps intermediate data inside the execution environment, out of the conversation context
How it works
You add a code execution tool to the model’s tool list – a sandbox for running model-written code. For each other tool, you set an allowed_caller that includes this code execution tool (referenced by a version ID like code_execution_20260120).
When the model gets a task like “query the database for last quarter purchases, then find top five customers by revenue,” its first response is a code snippet that calls the query tool, processes results, calls aggregation tools – all in one execution block.
The runtime extracts all tool uses from the code, executes them, collects results, and sends them back as a tool response. The model resumes execution with actual results and synthesizes the final answer.
Why it matters
- Fewer LLM round-trips – code can batch and parallelize tool calls
- Deterministic orchestration – filtering and branching happen in code, not in probabilistic JSON generation
- 30-50% token savings in workflows with large datasets or deterministic pipelines
-
Minimal integration changes – add the code execution tool, set
allowed_caller, update your runtime to extract and execute tool uses
LLMs are better at writing code than repeatedly emitting ad-hoc JSON. This plays to the model’s strength.
Feature 2: Dynamic Filtering for Web Fetch
This is a specialized application of programmatic tool calling for the web_fetch tool.
Problem: Traditional web fetch dumps entire HTML pages into context. Scripts, nav bars, footers, ads – all irrelevant, all consuming tokens.
Solution: Anthropic adds a middle layer. When web_fetch is called, filter code runs in a sandbox to extract only the relevant content before it hits the model’s context. The filtering is handled automatically by the code execution infrastructure.
Point to a specific version of web_fetch (e.g., web_fetch-2026209), and the API handles the rest. In testing, this reduces token consumption by about 24% on average for web-fetch-heavy workflows.
The API responses show the code execution steps where the filter extracts specific keys before returning a compact payload. You get what matters, not the entire DOM.
Feature 3: Tool Search and Deferred Loading
This addresses the scaling problem: what happens when your agent has dozens or hundreds of tools across multiple MCP servers.
Problem: Loading all tool schemas into every model call bloats context, even when most tools aren’t relevant to the current request. Many teams abandoned MCP for “skills + CRUD/REST” approaches to save tokens – but that sacrifices schema-driven type safety.
Solution: Instead of including full schemas for every tool, you provide a single tool_search tool that retrieves relevant tool definitions on demand.
-
The
tool_searchschema itself is small (~500 tokens) - Anthropic claims up to ~80% context savings in setups with many tools
-
You mark tools or entire MCP servers as deferred (
deferred_loading = true) -
The model calls
tool_searchto fetch definitions only when needed
MCP configuration
In the Anthropic MCP server config, default_config supports defer_loading = true to hide all tools by default. You can override per-action – for example, search_events can have defer_loading = false so it’s always visible while other tools stay deferred.
This is the right pattern if your agent has more than 10 tools or multiple MCP servers and you want type safety without constant token overhead.
Feature 4: Tool Use Examples
This improves parameter correctness for complex tools.
Problem: For tools like create_ticket in customer support systems, schemas can be complicated – nested fields, correlated constraints (escalation level determines SLA hours), ambiguous formats (what date format?). Even with good descriptions, models often:
- Misinterpret formats
- Skip optional fields that should be filled
- Mis-handle nested or correlated parameters
Solution: When defining a tool, you can now provide an array of concrete example calls (input_examples) showing properly-formed inputs. The model mimics these examples when populating complex or optional fields.
Most useful when:
- Valid JSON doesn’t guarantee correct semantics
- Tools have many optional parameters
- Deeply nested structures are involved
Jason reports accuracy on complex parameter handling went from about 72% to 90% when adding examples. That’s significant for production reliability.
What This Means for WorkingAgents
Our platform exposes 50+ MCP tools across tasks, CRM, WhatsApp, chat, summaries, blogs, and admin. These features are directly relevant:
Programmatic tool calling – A query like “find my overdue tasks, check which contacts are associated, and send them follow-up messages on WhatsApp” currently requires multiple round-trips. With code execution, Claude writes one code block that calls task_query, loops through results, calls nis_get_contact, and calls whatsapp_send – all in a single execution pass.
Tool search – With 50+ tools, we’re already in the range where deferred loading would save significant context. Most requests only need 3-5 tools. Loading all 50 schemas every time is wasteful.
Tool use examples – Our task creation tool has optional fields for priority, tags, due dates, recurrence, and parent IDs. Providing concrete examples of well-formed inputs would reduce the “almost right” parameter errors we see.
Dynamic filtering – The summary_request tool fetches URLs for summarization. Dynamic filtering would strip irrelevant HTML before it hits the summarization pipeline.
These aren’t theoretical improvements. They address problems we’ve already encountered in production.
The Bigger Picture
Jason frames these features as “tool calling 2.0” – and the label fits. The shift from “model emits JSON one call at a time” to “model writes code that orchestrates multiple tools deterministically” is a genuine architectural change.
The key insight: LLMs are better at writing code than at being a JSON serialization layer. Let them write code. Let the code handle orchestration, filtering, and data plumbing. Keep the conversation context clean.
For agent builders, the integration cost is low – add a code execution tool, set allowed_caller on existing tools, update your runtime to extract and execute. The payoff is fewer round-trips, lower token costs, and more reliable tool use.
Worth watching the full video for the implementation details: https://www.youtube.com/watch?v=3wglqgskzjQ