Andrej Karpathy’s autoresearch pattern – an AI agent that runs experiments, measures results, logs learnings, and iterates – was built for ML research. But the same loop works for any business problem where you have a metric, something you can change, and an API to connect them.
A recent breakdown by IndyDevDan walks through a concrete list of business domains where this pattern applies today. The core idea: you don’t need a research lab. You need a clear metric, a controllable input, and a programmable surface.
The General Pattern
Every autoresearch loop has three requirements:
- An objective metric that goes up or down – reply rate, conversion rate, revenue, CSAT, views
- A controllable input the agent is allowed to change – copy, layout, pricing, templates
- An API or programmable surface the agent can call to both read metrics and apply changes
The loop:
- Start from a baseline and formulate a hypothesis
- Generate a challenger (change the input)
- Run the experiment (deploy both, gather metrics)
- Keep the winner, log learnings, generate a new challenger, repeat
Fast feedback loops work dramatically better than slow ones. Five-minute ML training runs, hourly email batches, high-volume ad campaigns – these are ideal. Weekly or monthly cycles make iteration painfully slow.
Where It Works
Cold Email Copy
Optimize reply rate by changing email copy – subject lines, offer framing, length, CTAs. The agent generates new variants, deploys them via the email platform’s API (e.g., Instantly), waits for performance data, then harvests results. Learnings get logged: “sub-75-word emails with explicit risk-reversal and concrete time asks perform better.” Each new challenger is biased toward what worked.
Landing Page Conversion
Optimize conversion rate by changing page structure – headlines, sections, CTAs, social proof blocks. The agent edits the page via hosting API, deploys variants, runs them until enough traffic accumulates, compares against the baseline, keeps the winner.
Ad Creatives
Optimize click-through or purchase rate by changing creative assets and copy. Many ad platforms already auto-optimize, but the agent calls the ad platform API directly – reading performance metrics, generating new creative variants, and continuously testing challengers against baselines. The difference: the agent’s hypothesis log accumulates domain knowledge that platform algorithms don’t expose.
Chatbot and Support Scripts
Optimize CSAT or resolution rate by changing response templates. The agent tweaks the base script that both human and AI agents use, observes downstream satisfaction metrics, and keeps more successful variants. Over time the master script evolves toward what actually resolves issues.
Product Descriptions
Optimize sales volume or product-page conversion by changing description copy. If no API exists, a Chrome DevTools MCP or CLI-scripted browser flow can drive the web UI. The agent updates descriptions, waits for sales data, then iterates toward variants that sell more.
YouTube Titles
Optimize views, CTR, or watch-time by changing video titles. The agent uses YouTube Data API to fetch performance data, experiments with new titles, tracks performance deltas, and keeps better-performing variants.
Newsletter Subject Lines
Optimize open rate or click-through by changing subject lines and preview text. The agent generates new subject lines for the same content, deploys A/B tests via the email service provider’s API, and adopts the winning variant as the new baseline.
Pricing Pages
Optimize conversion to paid or revenue per visitor by changing price points, plan names, feature bundling, and secondary copy. The agent edits the pricing page via API, monitors metrics, and moves the design toward higher conversion.
A Concrete Implementation
The worked example in the video is a cold email optimizer with this structure:
- Orchestrator: top-level agent prompt that coordinates sub-agents, API calls, and storage
- Utility scripts: one-off tools for purging leads, batch deployment, parser tests
-
Configs: baseline campaign definition, a
resource.mdfile with accumulated best-practice notes, API tokens - Scheduled runner: GitHub Actions workflow that runs the optimization loop on a schedule (e.g., hourly)
The loop stages for email:
- Harvest – pull performance metrics from the email platform
- Generate – create new challenger email copy based on logged learnings
- Deploy – create campaigns, assign leads, activate automatically
Slack webhooks push every new challenger/baseline test and harvest result into a channel for visibility.
When It Doesn’t Work
Slow feedback. If each experiment takes weeks or months to yield data, iteration is painfully slow and the loop loses its advantage.
Fuzzy metrics. Goals like “warmth” or “brand perception” are too subjective unless you proxy them via concrete scales or analytics. The metric must be objective and measurable.
No programmable access. If there’s no API or scripted way to apply changes, you’re back to manual work and lose the automation benefit entirely.
Why This Matters Now
This is essentially what large ML labs already do – massive background experiment grids running continuously. The difference is that Claude Code plus the autoresearch pattern makes it accessible to solo builders and small businesses. You don’t need infrastructure. You need a metric, an API, and patience to let the loop run.
The pattern is general enough that any business process with a measurable outcome and a controllable input is a candidate. The question isn’t whether autoresearch works for your domain. The question is whether your feedback loop is fast enough to make iteration worthwhile.