WorkingAgents + Arize AI: A Natural Partnership for the Agent Engineering Stack

By James Aspinwall, co-written by Alfred Pennyworth (my trusted AI) — March 7, 2026, 17:25


The Thesis

WorkingAgents builds the agent orchestration layer — the brain that coordinates AI agents, tools, CRM, task management, and human communication. Arize AI builds the observability and evaluation layer — the eyes that watch what agents actually do, measure whether they’re doing it well, and surface where they fail.

These two platforms occupy adjacent, non-overlapping layers of the AI agent stack. Together, they form a complete build-observe-improve loop that neither can deliver alone.


What WorkingAgents Brings

WorkingAgents (“The Orchestrator”) is an MCP-powered agent orchestration platform built on Elixir OTP. It gives AI agents real tools to do real work:

WorkingAgents solves the “what agents can do” problem. It’s the runtime where agents act.

What Arize AI Brings

Arize AI is an agent and AI engineering platform for observing, evaluating, and improving AI agents and LLM applications. Their stack includes:

Arize solves the “how well agents perform” problem. It’s the feedback loop that makes agents better.


Where the Synergy Lives

1. Tool Call Observability — The Immediate Win

WorkingAgents dispatches 50+ tool calls through its MCP server. Every call is a decision point: Did the agent pick the right tool? Did it pass the right parameters? Did the result make sense?

Today, WorkingAgents has basic monitoring (health endpoints, process counts, memory usage) but no distributed tracing of tool invocations. There’s no correlation ID linking a user’s chat message to the chain of tool calls it triggered.

Arize’s OpenInference tracing would give WorkingAgents exactly this. Each MCP tool call becomes a span. Each chat session becomes a trace. Each user becomes a session. Suddenly you can see:

Integration path: Emit OpenTelemetry spans from MyMCPServer.Manager when dispatching tool calls. Arize Phoenix (self-hosted) or Arize AX (cloud) ingests the traces. Zero changes to business logic.

2. Agent Quality Evaluation — The Strategic Win

WorkingAgents runs multi-turn conversations where agents use tools to accomplish tasks. But there’s no automated way to answer: “Was this interaction good?”

Arize’s Evaluator Hub provides exactly this capability:

3. Multi-Provider Model Comparison

WorkingAgents is provider-agnostic — users switch between Claude, OpenRouter models, and Perplexity at runtime. This creates a natural experiment: Which provider handles which tool-use patterns best?

Arize’s side-by-side evaluation in the Prompt Playground would let WorkingAgents benchmark providers against the same real-world prompts, measuring:

This data directly informs which provider to default to for different task types — a competitive advantage for WorkingAgents’ consulting clients.

4. Permission and Security Auditing

WorkingAgents has a sophisticated access control system where every tool call is gated by capability-based permissions. Arize’s tracing would add a security dimension:

This transforms Arize from an observability tool into a security monitoring layer for WorkingAgents’ access control system.

5. A2A Protocol Observability

WorkingAgents implements Google’s Agent-to-Agent (A2A) protocol, allowing external agents to discover and invoke its tools as “skills.” As the A2A ecosystem grows, observability becomes critical:

Arize’s session-level tracing maps naturally to A2A task lifecycles — each A2A task becomes a trace, each skill invocation becomes a span.


Partnership Models

Technology Integration Partner

The most natural first step. WorkingAgents integrates Arize Phoenix (open-source) or Arize AX (enterprise) as its observability backend:

Consulting Channel Partner

James is building an AI consulting firm focused on AI integration for medium-size companies. Arize has an active GSI/consulting partner program (they recently partnered with Infogain for exactly this model):

Co-Marketing / Case Study

WorkingAgents’ architecture — Elixir OTP, MCP protocol, multi-provider LLM, A2A interop — is technically distinctive. A joint case study showing Arize observability on a non-Python, non-TypeScript agent platform would differentiate both companies:


The Gap Analysis — What Each Needs From the Other

WorkingAgents Gap Arize Solution
No distributed tracing OpenInference spans + Phoenix/AX collector
No tool call quality metrics Evaluator Hub with tool-calling templates
No provider comparison framework Prompt Playground side-by-side evaluation
No regression testing for agent behavior Online Evaluations with continuous monitoring
No cost-per-outcome tracking Span-level token usage and latency metrics
Arize Gap WorkingAgents Solution
Limited MCP-native examples Full 50+ tool MCP server implementation
Few Elixir/BEAM ecosystem references Production Elixir OTP agent orchestration
Need consulting channel partners AI consulting firm with Florida presence
Need A2A protocol observability stories Working A2A implementation with skill discovery

Recommended Next Steps

  1. Prototype integration — Add OpenTelemetry span emission to WorkingAgents’ MCP dispatcher. Point at a self-hosted Phoenix instance. Prove the tracing works end-to-end in a week.

  2. Reach out to Arize partnerships — Arize is actively hiring a GSI/Consulting Partnerships Manager and recently launched partnerships with Infogain and Google Cloud. The timing is right for new consulting partners.

  3. Build a demo — Record a session showing: user sends WhatsApp message → agent selects tools → tools execute → results returned — all visible in Arize’s trace view. This becomes the pitch deck for joint consulting engagements.

  4. Propose a case study — “MCP Agent Observability on Elixir OTP” is a story nobody else is telling. Arize’s content team would likely be interested.


Conclusion

WorkingAgents and Arize AI sit on opposite sides of the same coin. One builds the agent runtime, the other builds the agent feedback loop. Neither competes with the other. Both are stronger together.

For James’s consulting firm, the combination is particularly powerful: walk into a client meeting offering both “we’ll build your AI agents” and “we’ll show you exactly how they perform.” That’s a hard pitch to say no to.

The integration is technically straightforward (OpenTelemetry is protocol-level, not language-level), the partnership timing is right (Arize is actively expanding their consulting partner network), and the market positioning is complementary (orchestration + observability = complete agent engineering).

Time to make the call.


Sources: