Working title and alternates
- ⭐ "Three Things AI Hasn't Replaced — A Cross-Domain Field Guide" — preserves the "Three Things" spine; signals multi-domain scope
- "What AI Doesn't Replace: A Field Guide for Expert Decision-Makers" — cleaner, drops the numbered list framing
- "The Three Limits of AI Across Expert Decision-Making" — academic register
- "Where AI Stops: A Field Guide for Doctors, Lawyers, Traders, and Coaches" — explicit audience listing
- "When the Algorithm Meets the Expert: Three Gaps That Haven't Closed" — narrative, op-ed-leaning
One-line goal
Three structural gaps (autonomy, outperformance, pedagogy) that appear in every expert decision domain where AI is being marketed as replacement. Poker, medicine, law, finance, education each get treated as parallel case studies — not as adjacent-domain footnotes. The cross-domain pattern is the real argument: this isn't a poker-specific problem.
Story arc
The cultural narrative — "AI is replacing the radiologist, the lawyer, the trader, the coach, the editor" — is being sold across many expert domains simultaneously. Each domain has its own marketed AI products, its own boom industry, its own headline experiments. When examined carefully, the narrative fails on the same three structural fronts everywhere: AI methodology depends on humans (autonomy), AI hasn't been demonstrated to outperform top experts at scale (outperformance), and AI doesn't transmit the tacit knowledge experts apprentice into (pedagogy). The article walks through the pattern across domains; the cross-domain consistency is the argument.
Section-by-section beats
Open: the cultural narrative is the same in every domain
The radiologist's AI replacement, the contract-review AI, the algorithmic trader, the AI poker coach, the AI editor. Each one has its own marketed product, its own boom industry, its own breathless headline experiment. They all share the same cultural arc: "the algorithm is catching up to the expert; the expert's job is at risk." Position this article as a cross-domain field guide: examined carefully, the narrative fails on the same three structural fronts wherever it's tested.
The autonomy gap — AI methodology is human-in-the-loop in every expert domain
The structural property: AI in expert decision domains doesn't run autonomously. Humans build it, audit it, override it, curate its training data, define its edge cases, decide when to deploy it and when to suspend it. Examples per domain:
- Poker — tree designers (Tombos21's "heart of the problem" quote), reward functions, anti-cheat review queues (DLv4 + analyst loops)
- Medicine — radiology-AI training-data curation, clinician override workflows, edge-case review by senior radiologists, FDA approval gating
- Law — contract-review AI's attorney-review queues, confidentiality / privilege flagging, judgment about ambiguous clauses still escalates to humans
- Finance — algorithmic-trading systems' human risk officers, regime-shift kill switches, regulatory approval and audit
- Education — adaptive-learning tutoring AI's curriculum designers, content moderation, teacher escalation queues
The cross-domain takeaway: even the most automated-looking AI runs on humans. The question for any AI claim in any domain is where the humans are, not whether. The marketing erases the humans; the operating reality includes them.
The outperformance gap — even where AI matches, it's narrow-setting overfitting, not generalization
Two layers — both important; the second is what the marketing erases.
Layer 1 — the empirical record across domains. No expert decision domain has a clean public demonstration of AI outperforming the best experts at the variety of work the actual job entails:
- Poker — Cepheus, Libratus, DeepStack, Pluribus, with caveats their own authors disclose
- Go — AlphaGo, AlphaGo Zero, MuZero (each in narrow ruleset; no generalized board-game master AI)
- Medicine — radiology AI matches but doesn't surpass top radiologists at the scope of cases the job actually presents (Stanford / Google Health / DeepMind studies all show match-or-near-match, not surpass)
- Law — contract-review AI matches junior associates on narrow tasks; doesn't approach partner-level judgment on novel situations
- Finance — algorithmic trading succeeds in specific market regimes; fails under regime shifts; the best systematic shops still hire human researchers and risk officers
- Editing / writing — LLM AI matches grammar-and-style review but doesn't approach editor-level judgment on what's worth saying
Layer 2 — the overfitting layer. Even where AI matches top experts on a specific benchmark, the matching is typically achieved by throwing massive data and compute at one narrow setting — one game, one image type, one document genre, one market regime, one curriculum — and overfitting the model to that setting. Change the setting and performance degrades. The pattern holds across game-playing AI: AlphaStar plays one version of StarCraft against a specific opponent distribution; OpenAI Five plays one specific Dota 2 hero matchup; Suphx plays Mahjong at a specific ruleset; Pluribus plays 6-max NLHE at exactly the configuration its training expected. The pattern holds across professional-AI: radiology AI trained on one population's images degrades on another's; legal AI trained on US case law fails in other jurisdictions.
The claim "AI matches the best" is in practice "AI matches the best in this one narrow setting we trained for." When the setting shifts — when the rules change, the population changes, the distribution shifts, when true reasoning and adaptation is required — current AI overfits. The technology hasn't demonstrated the leap to generalized expert reasoning that the marketing implies. The two layers compound: rare rigorous demonstrations across domains, and even those are narrow-setting matching, not the generalized expert-equivalent reasoning the headlines suggest.
The pedagogy gap — AI doesn't transmit the tacit knowledge experts apprentice into
The third structural property: in every expert domain, the formalism (the textbook, the chart, the model's output) is the floor. The expert's role is the application of the formalism plus everything the formalism leaves out. AI implements the formalism. It doesn't apprentice you into the judgment.
Domain-by-domain:
- Poker coaching — solver charts give you the equilibrium frequencies. Coaches teach the deviation, the read, the tilt management, the study workflow. The chart is the floor; the coach raises the ceiling. (Cross-link to A.)
- Medical training — textbooks and AI-assisted decision support give you guidelines. Residency, mentorship, and clinical reasoning under uncertainty are what produce the senior physician. AI can summarize a chart; it can't conduct rounds.
- Legal apprenticeship — AI can produce a clean memo from a database. Junior associates learn what cases to bring up in front of a partner, what questions a judge will ask, what risk a client is willing to absorb. Tacit, transmitted in chambers.
- Trading and risk — quants ship models. Senior traders teach what model failure looks like, how to size during regime shifts, what stories the market is telling. AI runs the model; the trader runs the meta-model.
- Teaching — adaptive AI delivers content. Master teachers diagnose why a student isn't learning, sequence the next lesson, hold the room together. Apprenticed in classrooms, not in textbooks.
The cross-domain takeaway: in every expert domain, the formalism is necessary but insufficient. The expert's role is the structured application of the formalism plus the irreducible tacit knowledge of how to apply it. AI implements the first; humans transmit the second.
Synthesis — three gaps in every expert decision domain
Stack the three gaps. Each is independent. Each appears in every expert domain examined carefully. Each is an open structural problem, not a "we just need more compute" problem.
- Building the AI (autonomy) — the methodology depends on humans across all domains
- Outperforming the AI (outperformance) — narrow-setting matching, not generalized surpassing
- Teaching past the AI (pedagogy) — formalism is the floor, tacit transmission is the ceiling
The right diagnostic for any AI claim in any expert domain: "Where is the human in the loop? Where has the AI's outperformance been measured against the best of you, across the variety of work the actual job entails? Where does the formalism the AI implements stop reaching the practice?" When all three answers are clean, the AI claim survives. When any one is muddy, the claim doesn't.
Implications for expert decision-makers
What experts in domains being told an AI is replacing them should make of this. The three gaps as a diagnostic to apply to any specific AI claim in their field. The next decade isn't AI versus experts; it's experts who learn to work with AI versus experts who don't. The amplification, not the replacement, is the operating model that survives the three gaps.
Specific implications by domain:
- Coaches teaching with AI tools they understand the limits of
- Radiologists working alongside AI for triage; final read still human
- Lawyers using AI for discovery, judgment for strategy
- Traders running AI models with human risk oversight
- Teachers using adaptive AI for delivery, themselves for diagnosis and motivation
The amplification model is what scales. The replacement model fails the three gaps test in every domain.
Close
The cultural narrative will keep selling the same story across new domains as AI products launch. The diagnostic stays the same. Three gaps. Three independent open problems. Three reasons the expert's role has been amplified, not replaced. The marketing keeps trying to write that out of the picture; the operating reality keeps putting it back in.
Length budget
| § | Beat | Words |
|---|---|---|
| §1 | Open (cross-domain narrative) | 500 |
| §2 | Autonomy gap (cross-domain) | 1,200 |
| §3 | Outperformance gap (with overfitting layer, cross-domain) | 1,600 |
| §4 | Pedagogy gap (cross-domain) | 1,400 |
| §5 | Synthesis | 700 |
| §6 | Implications for expert decision-makers | 800 |
| §7 | Close | 400 |
| Total | ~6,600 | |
Within the 6,000–7,500 target. Tight enough for HBR feature length; expand §3 or §4 to 7,500 for Atlantic / New Yorker if needed.
How C differs from A and B
| Article | Anchor | Audience | Domain coverage |
|---|---|---|---|
| A | Coach pedagogy in poker | Coaches running schools, serious students | Poker only |
| B | Three gaps in poker (with adjacent-domain coda) | Tech-media + cultural readers | Poker primary; medicine/law/finance brief mention at end |
| C (this piece) | Three gaps across expert decision-making | AI researchers, decision-science readers, executives + policy | Poker as one case study; medicine, law, finance, education each get full treatment |
Cross-references
- A — coach pedagogy in poker. C cites for the poker-pedagogy example in §4.
- B — poker-anchored cultural piece. C and B share the three-gap framework but apply it differently (B = poker depth; C = cross-domain breadth).
- Manifesto (A1) — field-level argument behind the poker examples in §3.