WorkingAgents + Box: Letting Agents Work Across the Customer's Existing Content

Most companies do not need a new content store. They already have one. For a large share of enterprise teams that store is Box – contracts, statements of work, design files, scanned PDFs, marketing assets, photo libraries, training material, board decks, the messy archive of a decade of operations.

The interesting question is not “can we move this content into an AI-friendly system?” It is “can we let AI agents read, search, summarize, extract from, and write into the content store the customer already has – without giving every agent unfettered access?” That is the integration WorkingAgents is built to do.

This article describes that integration shape: what it looks like, what it unlocks, and what agents actually do with Box content when the access layer is right.

The picture in one paragraph

WorkingAgents is an AI Agent Gateway. It sits between AI agents and the customer’s tools, enforcing capability-based permissions and recording every call. Box is one of those tools. A WorkingAgents-Box connection means: agents addressing WorkingAgents can issue tool calls like box_search, box_get_file_metadata, box_extract_invoice_fields, or box_upload_summary. Each call is gated by a permission key the agent holds. The actual Box API call – with the customer’s enterprise Box credentials – happens inside WorkingAgents, never inside the agent.

The agent never sees the Box token. The Box content never leaves the agent’s prompt context further than it needs to. The audit log captures who called what, when, with which arguments.

That is the architecture. The rest is what it lets you build.

What’s actually in Box

The mistake most “AI for the enterprise” projects make is treating the content store as a homogeneous bucket of files. Box is more layered than that, and each layer is useful for different agent work.

The folder tree

The familiar part. Folders, subfolders, files. Permissions inherit. Box’s collaborator model lets you express “this user can view, that group can edit, the agent service account can only read this subtree.”

The agent value here is mechanical: agents can list_folder, traverse_path, and find_files_matching_pattern without an extra search layer. For workflows that follow a known folder structure (Customers / ACME / Contracts / 2026 / *.pdf), traversal is fast and predictable.

The metadata layer

Box’s metadata templates are the structured layer most teams underuse. A metadata template is a typed schema attached to a file: fields like Customer ID (string), Contract Value (number), Renewal Date (date), Status (enum). Once a file has metadata, it is queryable as structured data – you can ask “every contract with renewal date in the next 60 days where Status is ‘open’” without reading file bodies.

Most Box tenants have metadata templates defined but unevenly applied. Documents from before the template existed are bare. New documents may or may not be tagged depending on whether a human remembered. This is where AI agents earn their cost: they read the unstructured file body, extract the fields, and write them back to the metadata.

Box Extract and custom extraction agents

Box shipped Box Extract in January 2026 as a first-party feature. It runs AI-driven extraction against unstructured content and writes the result into a metadata template. Custom Extract Agents (Enterprise Advanced plan) let you describe the extraction task in natural language – “find the contract value, the renewal date, and the responsible party in this contract PDF” – and Box handles the model invocation and the metadata write.

For WorkingAgents customers, this matters because the integration does not need to reinvent extraction. WA tools can wrap Box Extract: a box_extract_to_template call routes through WorkingAgents permission gating, hits the Box Extract API, returns the extracted fields. The expensive bit (the model run) happens inside Box’s infrastructure. WorkingAgents adds the access control and the audit trail.

The unstructured layer

Below the metadata is the raw content – PDFs, Word docs, slides, spreadsheets, images, video, audio, ZIP archives. For each format, the agent question is the same: can I get the meaningful content out of this file?

Box helps in two ways:

These are useful primitives. An agent that wants to “scan the design assets folder and find every image with the old logo” doesn’t need to download every full-resolution TIFF – it can pull the rendered JPEG representation and run vision on that.

The WorkingAgents tool surface

A first-pass set of WA tools for Box, gated by per-tool permission keys following the existing AccessControlled pattern:

Tool Permission Purpose
box_search box.read Full-text and metadata search across the tenant
box_list_folder box.read Folder listing with file metadata
box_get_file box.read Download file content or representation (text / thumbnail / PDF)
box_get_metadata box.read Read structured metadata on a file
box_ai_ask box.ai.ask Natural-language question routed to Box AI grounded in a file or folder
box_extract_to_template box.write.metadata Run Box Extract against a file and write fields to a metadata template
box_upload_file box.write.content Upload a new file to a specified folder
box_create_metadata_template box.admin Define a new metadata template
box_share_link box.share Generate a shared link with the specified access level

Splitting read, AI-query, metadata-write, content-write, sharing, and admin into separate keys is the point. A research agent gets box.read and box.ai.ask only. A pipeline that auto-tags documents gets box.read and box.write.metadata. A consumer-facing chatbot that returns a shared link gets box.read and box.share but never box.write.content. The customer keeps the keys to their own kingdom.

What agents actually do with this

Concrete patterns, not aspirational ones:

Contract management

The agent watches a Box folder for new contract PDFs. On each arrival:

  1. Pulls the text representation (box_get_file with type=text).
  2. Runs Box Extract against a Contract metadata template (box_extract_to_template).
  3. If the extracted Renewal Date is within 90 days, fires a notification via the customer’s chosen channel (Slack, email, internal task system).
  4. If the extracted Contract Value is over a configurable threshold, posts a summary into a review folder for human approval before further action.

Total agent code: a few dozen lines. The hard work (OCR, parsing, structured extraction) lives in Box. The access control lives in WorkingAgents. The agent is glue.

Knowledge base for support

A customer support agent answers a user question. Before answering, it issues box_search for related documents in the support-content folder, then box_ai_ask to ground its answer in the matched files. The reply cites the Box file URL so the agent never claims something the source doesn’t say. If the user asks for “the full procedure,” the agent returns a Box shared link via box_share_link rather than pasting the text.

Permission scope on the agent: box.read, box.ai.ask, box.share. Cannot upload, cannot modify metadata, cannot escalate.

Marketing asset discovery

A marketing agent has been asked for “every photo we have of the new product line in outdoor settings.” It issues box_search filtered to the Marketing / Photos / 2026 folder, then iterates through results pulling the thumbnail representation (box_get_file with type=thumbnail), runs a vision model on each, and returns a ranked list. Full-resolution downloads only happen on the human’s pick.

Bandwidth and cost stay low because thumbnails are tiny and Box-rendered.

Invoice and AP processing

Invoice PDFs land in a Box folder via email-to-Box ingestion. A WorkingAgents pipeline:

  1. Detects the new file via Box webhook (relayed to WA).
  2. Runs box_extract_to_template against an Invoice template (vendor, amount, due date, line items).
  3. Cross-references vendor against an existing structured list (held in the customer’s CRM or in WA’s own Sqler).
  4. If recognized and the amount is below the auto-approve threshold, posts to the AP system. If above, routes to a human review folder.
  5. Writes the agent’s decision and reasoning back as a comment on the Box file – preserving the audit trail in the same content store where the invoice lives.

This is the pattern that pays for the integration. AP teams spend hours doing this manually.

Personal productivity, not just enterprise

The same primitives work for a single user. The Personal Box plan (or a Business seat at a small company) has the same API surface. An individual agent setup can:

The agent code is small. The Box-side smarts and the WorkingAgents access control do the heavy lifting.

Structured vs unstructured: the actual workflow

A practical lens for any agent-on-Box workflow:

  1. Start with structure if there is any. Search metadata first. It’s faster, cheaper, and more deterministic than reading file bodies.
  2. Extract structure when there isn’t. Box Extract via box_extract_to_template turns unstructured content into queryable metadata. Run extraction once, query the metadata many times.
  3. Reach into file bodies only when structure isn’t enough. Use box_ai_ask for narrow questions on specific files. It’s the most expensive call. Don’t ask it questions the metadata could answer.
  4. Surface results back as structure. If the agent learns something new from the file body, write it to a metadata template. The next agent shouldn’t have to re-derive what the first one already figured out.

This four-step pattern is what “AI for unstructured data” looks like when done correctly. The pile gets smaller every pass.

Implementation realities

Honest assessment of where this integration earns its cost and where it doesn’t:

Where it earns its cost

Where it doesn’t

Things to plan for

Bottom line

Box is where a lot of enterprise content actually lives. The reason agents-on-Box rarely ship today is not the agents and not Box – it’s the access layer. Hand an agent a Box service account and you’ve created a security incident waiting to happen. Hand it no access and it can’t do useful work.

WorkingAgents is the missing layer. Per-tool permission keys, audit logging, capability scoping for each agent. Box’s API and Box’s AI primitives do the actual content work. The combination lets a customer keep their content where it is, give agents the access they need and nothing more, and get an audit trail of every interaction.

For an AI consulting practice, this is one of the cleaner first deals to ship: the customer already has Box, the value (auto-tagging, search grounding, document processing) is measurable in hours saved per week, and the access control story is the part that lets a CIO say yes.