Six Types of "Hard" Problems (Most Aren't Reasoning)

By James Aspinwall — February 2026

We talk about AI solving “hard” problems as if difficulty were a single axis. It isn’t. Most of the difficult work knowledge workers do every week is not hard because of pure reasoning. Break “hard” open and at least seven distinct dimensions fall out, each mapping differently to what current AI can and cannot do.

Understanding these categories matters because it changes how you deploy AI. Throwing a reasoning model at an effort problem is like hiring a chess grandmaster to do data entry. Technically capable, wildly misallocated.

1. Reasoning Problems

These are problems where you must juggle many variables and follow non-obvious logical chains to a clear, verifiable answer. Multi-jurisdiction tax optimization. Complex derivative pricing. Novel regulatory classification of financial instruments that didn’t exist when the rules were written. Designing a distributed system that must satisfy twelve competing constraints simultaneously.

The signature of a reasoning problem is that the answer is objectively checkable once found, but the path to finding it is combinatorially vast. You can’t brute-force it with effort — you need to prune the search space intelligently.

Deep reasoning models like Gemini 3.1 Pro (especially at high or max “thinking levels”) materially change what’s tractable here. They can hold more variables in play, explore longer logical chains, and spot contradictions that humans miss under cognitive load. This is the category where “smarter model” directly translates to “better outcome.”

But here’s the thing: across a typical knowledge worker’s week, the genuinely reasoning-bottlenecked slice is often small — perhaps 10%. Most people hit a true reasoning wall a few times a month, not a few times a day.

2. Effort Problems

Effort problems are not intellectually difficult. They are just enormous. Auditing thousands of contracts for a single clause. Migrating millions of lines of legacy code from one framework to another. Reviewing every customer interaction from the last quarter to find churn signals. Reconciling data across fourteen systems that were never designed to talk to each other.

The challenge is sustained attention and thoroughness, not cleverness. A junior analyst could do each individual step. The problem is that there are ten thousand steps and the human attention span is not built for that.

These are exactly the kinds of problems agentic models like Opus 4.6, running for hours or days, are built to attack. An agent that can methodically work through a massive codebase, opening files, understanding context, making changes, running tests, and moving on — that’s not reasoning, that’s endurance. And machines don’t get bored, don’t lose focus at 3pm, and don’t skip steps because they’re tired.

This is where the bulk of near-term AI value lives for most organisations. Not brilliant insights — reliable throughput on work that humans find soul-crushing.

3. Coordination Problems

Coordination work involves aligning teams, managing dependencies, and ensuring information flows to the right places at the right times. It’s not about knowing the answer — it’s about making sure the right people know the right things and act in the right sequence.

The Rakuten deployment of Opus 4.6 illustrates this: a model autonomously closed issues and routed them across a 50-person engineering organisation spanning multiple repositories. The model exhibited a kind of organisational awareness — understanding who owns what, what blocks what, and what needs to happen next.

Coordination problems are deceptive because they feel like they should be easy. Each individual action is simple: send a message, update a ticket, assign a task. The difficulty is maintaining a mental model of a complex, evolving system of human commitments and technical dependencies. It’s the kind of work that senior managers spend most of their time on, and it’s the kind of work that scales badly with team size.

AI won’t replace the human judgment embedded in coordination — “should we delay the launch to fix this?” is a judgment call. But AI can handle the mechanical layer: tracking what’s blocked, who hasn’t responded, what slipped, and what’s coming up. That alone frees significant cognitive bandwidth.

4. Emotional Intelligence Problems

These are situations where success depends on reading people, delivering feedback with nuance, handling conflict, and managing change. Telling a team member their project is being cancelled. Navigating a negotiation where the other side is frustrated but hasn’t said so. Knowing when to push and when to back off.

Today’s models are not reliable here and may be fundamentally limited. They can generate empathetic-sounding text, but there’s a difference between producing the words “I understand how you feel” and actually perceiving the subtle cues — a shift in tone, a hesitation, a facial expression — that tell you how someone is really doing.

The bottleneck is not generating content. It’s perceiving context that isn’t in the text, timing interventions appropriately, and maintaining trust over long relationships. A model that gives technically correct feedback at the wrong moment, in the wrong tone, to the wrong audience, makes things worse.

This category is where humans remain irreplaceable, and likely will for a long time. It’s also where the gap between “AI assistant” and “AI replacement” is widest.

5. Judgment and Willpower Problems

Some decisions are hard not because we don’t know the analytically correct answer, but because acting on it is costly or risky. Killing a beloved project that the team has poured two years into. Walking away from revenue that’s misaligned with your strategy. Choosing a politically dangerous but strategically sound path. Firing someone who’s well-liked but underperforming.

These are courage and identity problems. The spreadsheet already tells you what to do. The hard part is doing it.

AI cannot solve these because the constraint is human willingness, not computation. A model can lay out the analysis, quantify the trade-offs, and even recommend the difficult path. But the moment of decision — where a leader puts their reputation and relationships on the line — that’s irreducibly human.

Where AI helps is in removing the ambiguity that people use as an excuse to avoid hard decisions. “We don’t have enough data” is often code for “I don’t want to face this.” An AI that quickly produces the data eliminates the hiding place, which paradoxically makes the human judgment problem harder, not easier.

6. Domain Expertise Problems

A senior engineer who’s been in the codebase for five years is faster not because they are inherently better reasoners but because they’ve seen this movie before. They recognise patterns, recall prior incidents, know which details actually matter and which are noise, and have calibrated intuitions about what will go wrong.

Similarly, a seasoned attorney doesn’t reason through contract law from first principles each time. They pattern-match against hundreds of prior contracts, spot the unusual clause immediately, and know from experience which apparently innocuous terms will cause problems in eighteen months.

Models can approximate this via training data. A model trained on millions of contracts has “seen” more contracts than any human attorney. But there remains a gap between “has read about it” and “has lived it,” especially in domains with thin public documentation — internal company processes, niche regulatory environments, systems with undocumented behaviour.

The practical implication: AI is most immediately useful in domains with rich public documentation (law, medicine, software) and least useful in domains where expertise is tacit, oral, or locked behind proprietary walls.

7. Ambiguity Problems

Finally, some problems are hard because the question itself is unclear. Defining strategy when the data admits multiple contradictory interpretations. Deciding what to build when customer requests are conflicting and their stated needs don’t match their revealed behaviour. Inferring what the board actually wants when the directive is “grow profitably” — two words that are in tension by definition.

Ambiguity problems can’t be solved by better reasoning because the reasoning has nothing firm to anchor to. The inputs are contested, the success criteria are undefined, and the “right” answer depends on values and priorities that haven’t been articulated.

Models can help explore scenarios, generate options, and stress-test assumptions. But they cannot eliminate uncertainty that is inherently non-computable or political. When the CEO and the CFO want different things and neither has said so explicitly, no amount of computational power resolves the ambiguity. That requires a human conversation, often an uncomfortable one.

The Distribution That Matters

Here’s why this taxonomy matters for how you adopt AI:

Problem Type	% of Knowledge Work	AI Readiness
Reasoning	~10%	High (deep thinking models)
Effort	~30%	Very high (agentic models)
Coordination	~20%	Medium (improving rapidly)
Emotional Intelligence	~10%	Low (fundamental limits)
Judgment/Willpower	~10%	None (irreducibly human)
Domain Expertise	~10%	Medium (domain-dependent)
Ambiguity	~10%	Low (can assist, not resolve)

The genuinely reasoning-bottlenecked slice of most people’s work is small. The bulk of difficulty comes from effort, coordination, ambiguity, and human factors. That is why an agentic workhorse like Opus 4.6 may see more day-to-day use in offices than a deep reasoning model — it attacks the 30% of work that is hard purely because of volume, not complexity.

Meanwhile, Google is comfortable owning the occasional but crucial “deep thinking” moment with Gemini’s extended reasoning modes. Different bets on different parts of the same distribution.

The companies that get AI adoption right won’t be the ones that find the smartest model. They’ll be the ones that correctly diagnose which type of “hard” they’re actually facing, and match the right tool to the right problem.

The best butler knows when to think and when to simply do the work. The worst mistake is confusing the two.