WorkingAgents + Distributional: Discovering What Your Agents Are Actually Doing

By James Aspinwall, co-written by Alfred Pennyworth (my trusted AI) — March 7, 2026, 17:55

The Problem: You Don’t Know What You Don’t Know

Most AI observability tools answer questions you already thought to ask. “How many tool calls failed?” “What’s the average latency?” “Did the agent hallucinate?” These are important, but they assume you know where to look.

Distributional asks a different question: What behavioral patterns exist in your agent’s production data that you haven’t discovered yet?

WorkingAgents orchestrates 50+ MCP tools across CRM, task management, content, and communications. Every day, agents make thousands of decisions — which tool to call, what parameters to pass, how to synthesize results. Somewhere in that data are patterns that explain why some sessions succeed and others don’t, why certain users get better results than others, why performance drifts over time.

Distributional’s DBNL platform finds those patterns through unsupervised statistical analysis. It doesn’t require you to define what “good” looks like upfront. It discovers the behavioral fingerprint of your agents and surfaces deviations, clusters, and shifts you wouldn’t have thought to monitor.

This is a fundamentally different capability from what WorkingAgents has today — and from what most evaluation platforms offer.

What Distributional Brings

Distributional (DBNL) is an adaptive analytics platform for production AI agents. Founded in September 2023 by the SigOpt team (acquired by Intel in 2020), backed by $30M from Andreessen Horowitz, Two Sigma Ventures, and others, the platform is built on a core insight: AI behavior is probabilistic, not deterministic, and testing it requires statistical methods native to that reality.

The Distributional Fingerprint

Every AI application has what Distributional calls a “distributional fingerprint” — its unique baseline mixture of characteristic distributions across behavior dimensions. This fingerprint captures:

How users interact with the system
Which topics and intents appear, and in what proportions
What tool sequences agents follow
How quality, cost, and latency correlate with each other
Where behavioral clusters form

When the fingerprint shifts — a new topic cluster emerges, a tool sequence that used to work starts failing, latency correlates with a specific user segment — DBNL surfaces it as an Insight.

The Adaptive Analytics Flywheel

DBNL operates through an eight-step cycle:

Ingest → Enrich → Analyze → Publish → Discover → Investigate → Track → Repeat

Ingest: Production logs arrive via OpenTelemetry traces, SDK push, or SQL pull
Enrich: Each log line is augmented with LLM-as-Judge evaluations, NLP metrics, topic classification, embeddings, and custom metrics — creating a rich behavioral vector per interaction
Analyze: Unsupervised learning and statistical techniques discover patterns — temporal shifts, behavioral clusters, outliers
Publish: Patterns appear as human-readable Insights and Dashboards
Discover: Teams review automatically surfaced signals they didn’t know to look for
Investigate: The Explorer tool enables population and temporal comparisons, drilling into the evidence behind each signal
Track: Meaningful patterns become saved Segments and custom Metrics for ongoing monitoring
Repeat: Tracked signals feed back into the enrichment and analysis, deepening future discovery

Three Types of Insights

Temporal Insights — Behavior that shifts over time. “Tool X response quality dropped 15% this week compared to the 30-day baseline.”
Segment Insights — Distinct behavioral clusters in the data. “Users who ask about CRM data in the morning get different tool sequences than afternoon users.”
Outlier Insights — Significant deviations from the norm. “This specific tool chain produced an anomalous pattern that doesn’t match any known cluster.”

Deployment Model

DBNL is free, open, and downloadable. It deploys in your Kubernetes cluster within your VPC. No data leaves your environment. Enterprise features include OpenID Connect SSO, role-based access, and workspace administration.

This matters. For an AI consulting firm deploying agents for clients, “your data stays in your infrastructure” eliminates the security objection before it’s raised.

What WorkingAgents Brings

WorkingAgents (“The Orchestrator”) is an Elixir OTP platform that gives AI agents real tools for business operations:

50+ MCP tools — CRM contacts/companies/pipeline, task management with 60+ query functions, content authoring, article summarization, alarm scheduling, system monitoring
Multi-provider LLM — Claude, OpenRouter, Perplexity, switchable at runtime
Permission-gated execution — capability-based access control on every tool call
Google A2A protocol — agent-to-agent task delegation and skill discovery
WhatsApp bridge — natural language tool invocation via messaging
Per-user isolation — separate SQLite databases per domain, per user

WorkingAgents generates exactly the kind of rich, multi-dimensional production data that Distributional is designed to analyze — tool calls, user interactions, topic diversity, model switching, and real business outcomes.

Where the Synergy Lives

1. Unsupervised Discovery on Tool Usage Patterns

WorkingAgents has 50+ tools. Users interact with agents in natural language, and the agent decides which tools to call, in what order, with what parameters. Today, there’s no systematic way to understand these tool usage patterns at scale.

Distributional’s unsupervised analysis would discover patterns like:

Tool sequence clusters — “80% of CRM-related sessions follow the pattern: nis_list_contacts → nis_get_contact → nis_log_interaction. But 12% skip directly to nis_log_interaction, and those sessions have 40% lower user satisfaction.”
Unused tool discovery — “The nis_pipeline tool exists but is called in only 3% of sales-related sessions. Sessions that do use it have 2x higher tool completeness scores.”
Parameter pattern analysis — “When users ask about ‘overdue tasks,’ the agent calls task_query with name: 'overdue' 70% of the time but task_dashboard 30% of the time. The dashboard path produces higher-quality responses.”

These are the patterns you wouldn’t think to monitor because you didn’t know they existed. Traditional observability counts tool calls. Distributional discovers the behavioral relationships between them.

2. User Behavior Segmentation

WorkingAgents serves different users with different needs. James manages CRM contacts. Jimmy asks about tasks and deadlines. Other consulting clients will have their own patterns. Distributional’s segment discovery would reveal:

User behavioral profiles — Natural clusters of how different users interact with agents, without predefined user categories
Intent distribution shifts — When a user’s query patterns change (maybe they started using CRM tools more and task tools less), DBNL surfaces the shift as a temporal insight
Cross-user patterns — “Users who set alarms via WhatsApp have 30% more follow-through on tasks than users who create tasks via the web interface”

This segmentation feeds directly into product decisions. If WhatsApp-originated tasks have higher completion rates, that’s a signal to invest more in the WhatsApp bridge experience.

3. Multi-Provider Model Comparison — Beyond Scores

Other evaluation platforms compare models with predefined metrics: accuracy, latency, cost. Distributional adds a dimension they can’t: behavioral fingerprint comparison.

WorkingAgents users can switch between Claude, OpenRouter models, and Perplexity at runtime. Distributional wouldn’t just score each provider — it would discover how the behavioral distribution changes:

“Claude sessions produce 4 distinct topic clusters. GPT-4o sessions produce 6 — the extra two clusters correspond to edge-case queries where GPT-4o attempts more complex tool chains.”
“Perplexity sessions show lower latency but a temporal drift in tool selection accuracy over multi-turn conversations — performance degrades after turn 5.”
“OpenRouter Llama sessions cluster differently from proprietary models on CRM queries — they under-use nis_search and over-rely on nis_list_contacts with broad filters.”

This is richer than “Model A scored 4.2, Model B scored 3.8.” It reveals how models behave differently, not just how well.

4. The AI Data Flywheel for Consulting Clients

Distributional explicitly positions their platform around the “Analytics-Driven AI Data Flywheel” — using discovered signals and surfaced examples for post-training optimization. For WorkingAgents’ consulting business, this creates a concrete service offering:

Month 1: Deploy — Install WorkingAgents with custom tools for the client’s domain. Connect DBNL to ingest traces.

Month 2: Discover — DBNL surfaces behavioral patterns. “Your agents handle inventory queries well but struggle with multi-step procurement workflows. Here are the 47 example traces showing the failure pattern.”

Month 3: Optimize — Use the surfaced examples for prompt engineering, tool redesign, or model switching. DBNL’s tracked segments measure whether the changes worked.

Month 4+: Flywheel — Each optimization cycle surfaces new patterns in the changed behavior. The agent gets measurably better every month, with evidence.

This is recurring revenue built on data, not opinion. The consulting engagement doesn’t end after deployment — it becomes a continuous optimization service powered by Distributional’s discovery engine.

5. Permission and Access Pattern Analytics

WorkingAgents’ capability-based access control system creates a rich dataset: which users have which permissions, which tools they actually call, and how their usage patterns differ from their permission scope.

Distributional could surface insights like:

“Users with task_manager permission but not nis permission ask CRM-related questions 15% of the time — hitting permission denials. Consider expanding their access or improving the agent’s handling of out-of-scope requests.”
“Temporary access keys (TTL-based) show a different behavioral distribution than permanent keys — users with temp keys complete tasks 25% faster, possibly due to urgency.”
“A new behavioral segment emerged this week: users who chain nis_create_contact → task_create → task_link. This workflow isn’t documented but appears intentionally productive.”

6. Temporal Drift Detection on Agent Behavior

AI agents aren’t static. Model updates, prompt changes, data drift, and user behavior evolution all cause the behavioral fingerprint to shift. WorkingAgents currently has no way to detect these shifts.

Distributional’s temporal insights would catch:

Model update drift — When Anthropic updates Claude, does the agent’s tool selection distribution change? Do certain tool chains break?
Prompt engineering impact — After modifying system prompts, did the behavioral fingerprint change in the intended direction? Or did it also shift in unexpected dimensions?
Seasonal patterns — Do end-of-month CRM queries spike? Do task creation patterns follow weekly cycles?
User adaptation — As users learn the system, does their interaction pattern evolve? Do they discover more efficient tool chains over time?

These temporal signals are invisible to snapshot-based evaluation tools. They only emerge from continuous distributional analysis.

Why Distributional is Different From Arize or Deepchecks

Distributional occupies a distinct position in the AI analytics landscape:

Dimension	Arize AI	Deepchecks	Distributional
Core approach	Trace observability	Evaluation scoring	Behavioral discovery
Primary question	“What happened?”	“Was it good?”	“What patterns exist?”
Method	OpenTelemetry spans	LLM-as-Judge swarm	Unsupervised statistical analysis
Requires predefined metrics	Partially	Yes	No — discovers metrics
Deployment	Cloud SaaS or self-hosted	Cloud SaaS or on-prem	Free, open, in your VPC
Best for	Debugging specific failures	Scoring agent quality	Finding unknown unknowns

For WorkingAgents, these three platforms are complementary layers:

Arize traces what happened (the execution path)
Deepchecks evaluates whether it was done well (the quality score)
Distributional discovers what you should be paying attention to (the behavioral signal)

Distributional fills the gap between “we monitor our agents” and “we understand our agents.”

The Gap Analysis

WorkingAgents Gap	Distributional Solution
No behavioral pattern discovery	Unsupervised learning surfaces unknown clusters, shifts, and outliers
No tool usage correlation analysis	Distributional fingerprint captures tool-sequence-to-outcome correlations
No user segmentation analytics	Segment Insights automatically cluster user behavior profiles
No temporal drift detection	Temporal Insights surface behavioral shifts over time
No data flywheel for continuous improvement	Adaptive Analytics Flywheel with surfaced examples for optimization
No multi-dimensional model comparison	Behavioral fingerprint comparison across providers

Distributional Gap	WorkingAgents Solution
Need production agent data sources	50+ MCP tool traces with rich business context
Need diverse tool-calling patterns	CRM + tasks + content + communication tool chains
Need multi-provider comparison scenarios	Runtime-switchable Claude/OpenRouter/Perplexity
Need consulting distribution channel	AI consulting firm deploying for medium-size companies
Need non-Python ecosystem references	Elixir OTP — unique agent orchestration stack
Need real business outcome data	CRM pipeline, task completion, follow-up tracking

Partnership Models

Technology Integration

The natural starting point. WorkingAgents emits OpenTelemetry traces from its MCP dispatcher. DBNL ingests, enriches, and analyzes.

DBNL deploys in WorkingAgents’ infrastructure — free, open, in the same VPC. Zero data leaves.
Default metrics (answer relevancy, user frustration, topic classification) run automatically on every interaction.
Custom metrics specific to WorkingAgents’ domain — CRM data accuracy, task completion rates, tool chain efficiency.

WorkingAgents gains: Behavioral discovery and continuous improvement analytics without building ML infrastructure. Distributional gains: A production MCP reference customer on Elixir/OTP with rich multi-tool, multi-provider agent data.

Consulting Partnership

Distributional’s team comes from SigOpt, Bloomberg, Google, Meta, Stripe, and Uber. They understand enterprise AI deployment. WorkingAgents’ consulting firm deploys agents for medium-size companies. The partnership creates a joint offering:

WorkingAgents consulting: Deploys the agent orchestration layer
DBNL: Provides the analytics layer that proves agents are working and improving
Joint deliverable: Monthly behavioral intelligence reports showing discovered patterns, optimization recommendations, and measured improvements

For Distributional, this is channel distribution through consulting engagements. For WorkingAgents, this is a continuous-improvement service tier that generates recurring revenue.

Co-Marketing: “The Open AI Agent Analytics Stack”

Both companies share a deployment philosophy: open, self-hosted, data stays in your environment. A joint positioning as “the open stack for production AI agents” — orchestration (WorkingAgents) plus analytics (DBNL) — differentiates from cloud-locked alternatives.

Distributional’s $30M in funding from a16z and Two Sigma Ventures gives them marketing reach. A case study showing DBNL discovering behavioral patterns in a production MCP agent platform would be distinctive content for both companies.

Recommended Next Steps

Deploy DBNL sandbox — Distributional offers a free sandbox at docs.dbnl.com. Connect WorkingAgents’ MCP dispatcher traces. See what the platform discovers from even a week of production data.
Instrument the MCP dispatcher — Add OpenTelemetry span emission to MyMCPServer.Manager. Include tool name, parameters, user ID, provider, and session ID as span attributes. This is the minimum data DBNL needs.
Run the flywheel once — Ingest a month of traces. Let DBNL’s unsupervised analysis run. Review the Insights. Pick one discovered pattern and optimize for it. Measure the result. This single cycle demonstrates the value proposition to consulting clients.
Contact Distributional — They’re a Series A company actively expanding. The SigOpt team built their reputation on optimization for enterprise AI. An MCP-native agent orchestration reference on a non-Python stack would be a differentiated story for their portfolio.
Design the consulting package — “Managed AI Agent Operations with Behavioral Analytics” — deploy agents, connect DBNL, deliver monthly intelligence reports, continuously optimize. This is the repeating revenue model.

Conclusion

Distributional solves a problem that most AI teams don’t even know they have: the unknown unknowns in agent behavior. WorkingAgents builds the agents. Distributional discovers what those agents are actually doing in production — the behavioral patterns, correlations, clusters, and drifts that no amount of manual log reading or predefined metrics will surface.

The combination is particularly powerful for consulting. Walk into a client meeting and say: “We deploy AI agents, and we use statistical behavioral analysis to discover patterns in how they operate. Last month we found that your procurement agent was using an inefficient tool chain on 23% of requests. We optimized it. Here’s the before-and-after distributional fingerprint.”

That’s not a pitch. That’s evidence.

DBNL is free, open, and deploys in your infrastructure. The integration is OpenTelemetry — protocol-level, language-agnostic. The team is ex-SigOpt, Google, Meta, Bloomberg. The funding is a16z and Two Sigma. And they’re still early enough that a partnership conversation gets real attention.

The flywheel starts with one trace. Time to emit it.

Sources: