Most companies do not need a new content store. They already have one. For a large share of enterprise teams that store is Box – contracts, statements of work, design files, scanned PDFs, marketing assets, photo libraries, training material, board decks, the messy archive of a decade of operations.
The interesting question is not “can we move this content into an AI-friendly system?” It is “can we let AI agents read, search, summarize, extract from, and write into the content store the customer already has – without giving every agent unfettered access?” That is the integration WorkingAgents is built to do.
This article describes that integration shape: what it looks like, what it unlocks, and what agents actually do with Box content when the access layer is right.
The picture in one paragraph
WorkingAgents is an AI Agent Gateway. It sits between AI agents and the customer’s tools, enforcing capability-based permissions and recording every call. Box is one of those tools. A WorkingAgents-Box connection means: agents addressing WorkingAgents can issue tool calls like box_search, box_get_file_metadata, box_extract_invoice_fields, or box_upload_summary. Each call is gated by a permission key the agent holds. The actual Box API call – with the customer’s enterprise Box credentials – happens inside WorkingAgents, never inside the agent.
The agent never sees the Box token. The Box content never leaves the agent’s prompt context further than it needs to. The audit log captures who called what, when, with which arguments.
That is the architecture. The rest is what it lets you build.
What’s actually in Box
The mistake most “AI for the enterprise” projects make is treating the content store as a homogeneous bucket of files. Box is more layered than that, and each layer is useful for different agent work.
The folder tree
The familiar part. Folders, subfolders, files. Permissions inherit. Box’s collaborator model lets you express “this user can view, that group can edit, the agent service account can only read this subtree.”
The agent value here is mechanical: agents can list_folder, traverse_path, and find_files_matching_pattern without an extra search layer. For workflows that follow a known folder structure (Customers / ACME / Contracts / 2026 / *.pdf), traversal is fast and predictable.
The metadata layer
Box’s metadata templates are the structured layer most teams underuse. A metadata template is a typed schema attached to a file: fields like Customer ID (string), Contract Value (number), Renewal Date (date), Status (enum). Once a file has metadata, it is queryable as structured data – you can ask “every contract with renewal date in the next 60 days where Status is ‘open’” without reading file bodies.
Most Box tenants have metadata templates defined but unevenly applied. Documents from before the template existed are bare. New documents may or may not be tagged depending on whether a human remembered. This is where AI agents earn their cost: they read the unstructured file body, extract the fields, and write them back to the metadata.
Box Extract and custom extraction agents
Box shipped Box Extract in January 2026 as a first-party feature. It runs AI-driven extraction against unstructured content and writes the result into a metadata template. Custom Extract Agents (Enterprise Advanced plan) let you describe the extraction task in natural language – “find the contract value, the renewal date, and the responsible party in this contract PDF” – and Box handles the model invocation and the metadata write.
For WorkingAgents customers, this matters because the integration does not need to reinvent extraction. WA tools can wrap Box Extract: a box_extract_to_template call routes through WorkingAgents permission gating, hits the Box Extract API, returns the extracted fields. The expensive bit (the model run) happens inside Box’s infrastructure. WorkingAgents adds the access control and the audit trail.
The unstructured layer
Below the metadata is the raw content – PDFs, Word docs, slides, spreadsheets, images, video, audio, ZIP archives. For each format, the agent question is the same: can I get the meaningful content out of this file?
Box helps in two ways:
- Box AI can be asked a natural-language question about a file or a folder of files and returns an answer grounded in the file body. Useful for “summarize this 80-page deposition” or “which of these contracts mention indemnification?” without the agent reading every line.
- Box’s representations API generates extracted text, thumbnails, and PDF previews from binary formats. Want the text of a scanned PDF? Box has already OCR’d it. Want a thumbnail of a CAD file? Box has rendered one. Both are addressable as URLs, no extra processing on the WA side.
These are useful primitives. An agent that wants to “scan the design assets folder and find every image with the old logo” doesn’t need to download every full-resolution TIFF – it can pull the rendered JPEG representation and run vision on that.
The WorkingAgents tool surface
A first-pass set of WA tools for Box, gated by per-tool permission keys following the existing AccessControlled pattern:
| Tool | Permission | Purpose |
|---|---|---|
box_search |
box.read |
Full-text and metadata search across the tenant |
box_list_folder |
box.read |
Folder listing with file metadata |
box_get_file |
box.read |
Download file content or representation (text / thumbnail / PDF) |
box_get_metadata |
box.read |
Read structured metadata on a file |
box_ai_ask |
box.ai.ask |
Natural-language question routed to Box AI grounded in a file or folder |
box_extract_to_template |
box.write.metadata |
Run Box Extract against a file and write fields to a metadata template |
box_upload_file |
box.write.content |
Upload a new file to a specified folder |
box_create_metadata_template |
box.admin |
Define a new metadata template |
box_share_link |
box.share |
Generate a shared link with the specified access level |
Splitting read, AI-query, metadata-write, content-write, sharing, and admin into separate keys is the point. A research agent gets box.read and box.ai.ask only. A pipeline that auto-tags documents gets box.read and box.write.metadata. A consumer-facing chatbot that returns a shared link gets box.read and box.share but never box.write.content. The customer keeps the keys to their own kingdom.
What agents actually do with this
Concrete patterns, not aspirational ones:
Contract management
The agent watches a Box folder for new contract PDFs. On each arrival:
-
Pulls the text representation (
box_get_filewith type=text). -
Runs Box Extract against a
Contractmetadata template (box_extract_to_template). -
If the extracted
Renewal Dateis within 90 days, fires a notification via the customer’s chosen channel (Slack, email, internal task system). -
If the extracted
Contract Valueis over a configurable threshold, posts a summary into a review folder for human approval before further action.
Total agent code: a few dozen lines. The hard work (OCR, parsing, structured extraction) lives in Box. The access control lives in WorkingAgents. The agent is glue.
Knowledge base for support
A customer support agent answers a user question. Before answering, it issues box_search for related documents in the support-content folder, then box_ai_ask to ground its answer in the matched files. The reply cites the Box file URL so the agent never claims something the source doesn’t say. If the user asks for “the full procedure,” the agent returns a Box shared link via box_share_link rather than pasting the text.
Permission scope on the agent: box.read, box.ai.ask, box.share. Cannot upload, cannot modify metadata, cannot escalate.
Marketing asset discovery
A marketing agent has been asked for “every photo we have of the new product line in outdoor settings.” It issues box_search filtered to the Marketing / Photos / 2026 folder, then iterates through results pulling the thumbnail representation (box_get_file with type=thumbnail), runs a vision model on each, and returns a ranked list. Full-resolution downloads only happen on the human’s pick.
Bandwidth and cost stay low because thumbnails are tiny and Box-rendered.
Invoice and AP processing
Invoice PDFs land in a Box folder via email-to-Box ingestion. A WorkingAgents pipeline:
- Detects the new file via Box webhook (relayed to WA).
-
Runs
box_extract_to_templateagainst anInvoicetemplate (vendor, amount, due date, line items). - Cross-references vendor against an existing structured list (held in the customer’s CRM or in WA’s own Sqler).
- If recognized and the amount is below the auto-approve threshold, posts to the AP system. If above, routes to a human review folder.
- Writes the agent’s decision and reasoning back as a comment on the Box file – preserving the audit trail in the same content store where the invoice lives.
This is the pattern that pays for the integration. AP teams spend hours doing this manually.
Personal productivity, not just enterprise
The same primitives work for a single user. The Personal Box plan (or a Business seat at a small company) has the same API surface. An individual agent setup can:
- Watch a “Inbox” folder for scanned documents and file them automatically based on extracted content.
- Maintain a Box-stored knowledge base that a personal assistant agent searches before answering questions.
- Auto-summarize long PDFs dropped into a “Read Later” folder into a short markdown summary, written back to a sibling folder.
The agent code is small. The Box-side smarts and the WorkingAgents access control do the heavy lifting.
Structured vs unstructured: the actual workflow
A practical lens for any agent-on-Box workflow:
- Start with structure if there is any. Search metadata first. It’s faster, cheaper, and more deterministic than reading file bodies.
-
Extract structure when there isn’t. Box Extract via
box_extract_to_templateturns unstructured content into queryable metadata. Run extraction once, query the metadata many times. -
Reach into file bodies only when structure isn’t enough. Use
box_ai_askfor narrow questions on specific files. It’s the most expensive call. Don’t ask it questions the metadata could answer. - Surface results back as structure. If the agent learns something new from the file body, write it to a metadata template. The next agent shouldn’t have to re-derive what the first one already figured out.
This four-step pattern is what “AI for unstructured data” looks like when done correctly. The pile gets smaller every pass.
Implementation realities
Honest assessment of where this integration earns its cost and where it doesn’t:
Where it earns its cost
- High-volume document processing (invoices, contracts, applications, claims).
- Knowledge-base grounding for support and sales agents.
- Asset discovery in image, video, and design libraries.
- Compliance workflows where the audit trail matters more than the speed.
- Migration / cleanup work, where extracted metadata accumulates and previously-undiscoverable files become discoverable.
Where it doesn’t
- Tiny content stores. If the customer has 200 files, a folder-walk script is fine. The integration overhead isn’t worth it.
- Real-time interactive workflows where Box API latency (typically hundreds of milliseconds for read, longer for AI calls) is too slow. Cache or move the content closer.
- Highly regulated content where the customer cannot let any agent (even a permission-gated one) touch certain files. Use Box’s existing collaborator model to wall those off from the agent service account before WorkingAgents ever sees them.
Things to plan for
- Authentication. Box uses OAuth 2.0 with optional JWT for service accounts. The WorkingAgents-side integration holds the credentials, refreshing tokens. The agent never authenticates to Box.
- Rate limits. Box’s API has per-user and per-application limits. A pipeline that hits Box hundreds of times a minute needs back-pressure – WA’s per-tool key allows budgeting at the permission level.
- Cost. Box AI calls and Box Extract are billed separately from base seats on Enterprise Advanced. Track usage per WA token to attribute cost back to specific agents or customers.
- Versioning. Box keeps file versions. An agent that writes a summary back as a comment, not a new version, avoids cluttering history. A pipeline that updates an existing document creates a version – which is the right behavior for audit but worth knowing.
- Webhook relay. Box’s webhooks fire to a URL you specify. WorkingAgents needs an endpoint to receive them and route into the agent pipeline. Standard Plug/Phoenix pattern; not exotic, but a piece to plan.
Bottom line
Box is where a lot of enterprise content actually lives. The reason agents-on-Box rarely ship today is not the agents and not Box – it’s the access layer. Hand an agent a Box service account and you’ve created a security incident waiting to happen. Hand it no access and it can’t do useful work.
WorkingAgents is the missing layer. Per-tool permission keys, audit logging, capability scoping for each agent. Box’s API and Box’s AI primitives do the actual content work. The combination lets a customer keep their content where it is, give agents the access they need and nothing more, and get an audit trail of every interaction.
For an AI consulting practice, this is one of the cleaner first deals to ship: the customer already has Box, the value (auto-tagging, search grounding, document processing) is measurable in hours saved per week, and the access control story is the part that lets a CIO say yes.