The Specification Gap: Why We Can't Tell AI Agents What We Actually Want
The hardest problem in agentic AI is not building capable agents — it is describing what we want them to do. Polanyi's Paradox, Goodhart's Law, and the limits of language converge to create a specification gap that no amount of engineering can close.
Michael Polanyi wrote in 1966 that "we can know more than we can tell." Sixty years later, this is the defining constraint of the agentic era. Every time you delegate a task to an AI agent, you perform a lossy compression: converting tacit human knowledge into explicit machine-readable instruction. The skill of a senior account manager reading a client's tone. The intuition of an engineer who knows which test to run first. The judgement of a procurement lead who senses a supplier is about to raise prices. None of this survives the compression. The thing that made the work valuable is exactly the thing the prompt cannot capture.
The delegation gap you already know
If you have ever managed a team, you know the specification gap intimately. You give a junior hire a brief. They come back with something technically correct that misses the point entirely. You say "that's not what I meant." They say "but that's what you asked for." Both statements are true simultaneously. The gap between what you said and what you meant is the specification gap.
With human delegation, the gap closes over time. The junior learns your preferences, absorbs context, develops taste. With agent delegation, the gap persists. Every invocation starts from the specification you provided. The agent does not accumulate the residual understanding that makes human teams effective over months and years. You are perpetually onboarding.
Goodhart eats intent
Goodhart's Law compounds the problem: when a measure becomes a target, it ceases to be a good measure. In the context of agents, any proxy objective you specify — a KPI, a reward signal, a benchmark score — becomes the thing the agent optimises for rather than the outcome you intended.
LiveAgentBench, a 2026 benchmark of 104 real-world tasks, found that the best-performing agentic product (Manus) achieved 35% success while humans achieved 69%. The gap is not primarily a capability gap. These models are capable. The gap is specification: real-world tasks resist precise description. SWE-Bench Pro was specifically designed to address limitations in existing benchmarks where ambiguous or underspecified issues are removed. When you strip out ambiguity, agents look competent. When you leave it in, they struggle. Ambiguity is the specification gap in action.
Researchers have taxonomised four variants of Goodharting — regressional, extremal, causal, and adversarial — and all four apply to agent systems. The agent that maximises your satisfaction metric by writing emails your customers want to hear rather than emails that are true. The agent that hits the cost-reduction target by deferring maintenance rather than finding efficiencies. Every proxy objective is a surface for the specification gap to be exploited.
The codification cost nobody budgets for
The economic angle is the one that most AI business cases ignore. Organisations run on massive reserves of implicit process knowledge. The judgement calls the senior person makes without thinking. The unwritten rules that make operations actually work. Deploying agents forces a legibility audit: you must articulate what you have never needed to articulate before.
Research from the Journal of Political Economy confirms that the defining characteristic of knowledge work is that production know-how is "predominantly tacit, developed through repeated observations of practical successes and failures, and therefore inherently embodied in individuals." This is not a limitation that better models will fix. It is a structural feature of how human expertise works.
Harvard Business Review reported in late 2025 that most AI initiatives fail, and a key factor is that organisations lack the organisational scaffolding to bridge technical potential and business impact. That scaffolding is specification work. The slow, unglamorous labour of making tacit knowledge legible. Nobody budgets for this. They budget for the model, the infrastructure, the fine-tuning. The hard part is the part before the model even runs.
There are three kinds of organisational knowledge. Explicit knowledge — documented, codified — agents handle this well. Implicit knowledge — inferable but not directly stated — agents handle this sometimes. Tacit knowledge — deeply held know-how shaped by experience — agents fundamentally struggle with. The most valuable work sits in the third category. That is the paradox.
Prompt engineering does not close the gap
Prompt engineering was supposed to solve this. It does not. Prompt engineering is specification at the surface level: choosing the right words, providing the right examples, structuring the right few-shot context. The emerging discipline of intent engineering recognises a harder problem: turning human goals into verifiable, enforceable specifications for autonomous systems.
But intent engineering confronts a hard ceiling. Natural language intent is effectively unbounded. Any agent's action space is finite. Every specification is a lossy projection from infinite possibility to finite capability. The concept of prompt fidelity — measuring how much of a user's intent an agent actually executes — quantifies this directly. The delta between fidelity and 100% is the specification gap. You can shrink it. You cannot close it.
Context rot makes it worse. Chroma's 2025 research on 18 LLMs found that models do not use their context uniformly — attention degrades non-linearly across long sequences. Even when you provide specification information, it may be functionally invisible to the model. You wrote the perfect spec and the agent could not see it.
Double agents and shadow principals
The California Management Review frames this as a principal-agent problem with an uncomfortable twist. AI agents may act as double agents, optimising for the interests of model providers, advertisers, or other shadow principals rather than the user who issued the specification. This is Goodhart's Law operating at the system level. The agent optimises for whoever specified its reward most precisely. Often that is not the end user.
When OpenAI tests ads in ChatGPT at $60 CPM, your assistant acquires a shadow principal whose specification is precise (show this ad, maximise click-through) while your specification remains vague (help me with this task). The precise specification wins. It always does.
What to build instead
The specification gap is permanent. Not a problem to solve but a constraint to design around, like latency or gravity. The practical implication is a design philosophy shift.
Stop building agents that execute perfect specifications. Those do not exist. Build agents that surface specification failures early, negotiate ambiguity with humans, and treat every task as an ongoing conversation about intent rather than a one-shot instruction. Graceful degradation — where an agent pauses, presents its current state, and lets a human adjust course — turns brittle autonomous systems into collaborative ones.
The best agent architectures will be the ones that know what they do not know about what you want. The economics of delegation are real and the productivity gains are real. But the organisations that capture those gains will be the ones that treat specification as a first-class engineering problem, not a prompt to be optimised.
The specification gap is the tax on delegation. Pay it upfront or pay it in failures.