An Anchor Delivers Only as Far as the Prior Reaches
What a pull request about "use cases" taught us about semantic anchors — with an experiment you can rerun yourself.
|
The short version. A semantic anchor works by triggering a concept the model already learned. Its power is therefore proportional to how densely that concept appears in the training data. We tested this directly: naming "Cockburn use cases" reshapes a generic answer into a complete fully-dressed use case (the anchor delivers), while naming "Use-Case 3.0" delivers nothing distinct — the model silently falls back to the nearest concept it does know. That is why an anchor’s popup describes the triggered definition, not the state of the art, and why weak-prior terms belong in a contract (which supplies its own meaning), not an anchor. |
A Discussion About One Anchor Started It
Simon Martinelli opened a pull request proposing to rename the Cockburn Use Cases anchor to plain Use Cases, fix the attribution, and modernise it with Use-Case 2.0 and 3.0.
His facts are correct. Ivar Jacobson invented use cases (OOPSLA 1987, Object-Oriented Software Engineering, 1992); Cockburn did not — his Writing Effective Use Cases (2001) codified how to write them well. The technique later grew into Use-Case 2.0 (Jacobson, Spence & Bittner, 2011) and Use-Case 3.0 (Jacobson, Spence & de Mendonca, 2024). As a daily practitioner, Simon added that most teams no longer separate Cockburn from Jacobson, and that the fully-dressed ceremony is rarely used.
When the discussion continued, Simon made a fair challenge: he pasted output from several chatbots to show they clearly know use cases. They do — and that turned out to be beside the point. The question is not whether a model knows the term, but which definition the term triggers and how far that knowledge reaches. The rest of this article answers that question with an experiment, and the answer is what decided the pull request.
A Semantic Anchor Is a Trigger, Not a Definition
A semantic anchor is a term that activates a rich concept already sitting in a model’s training data. You do not teach the model when you write "Cockburn Use Cases"; you pull a pre-computed prior off the shelf — goal levels, the fully-dressed template, extensions, stakeholders and interests — with a few words. That is the leverage the catalog sells: a short term stands in for pages of context the model has already absorbed.
The consequence is uncomfortable but unavoidable: an anchor can only be as strong as the training data behind it. If the concept is densely written about, the term fires reliably. If it is recent or niche, the term fires weakly — and, as we will see, the model does not go quiet. It substitutes.
The Experiment: Does Naming the Anchor Change the Output?
The cleanest test of an anchor is a before/after. Give a model a task that does not contain the anchor word, then give it the same task with the anchor, and compare. We used one task — "specify what an online shop must do when a customer places an order" — under five framings, on a weak model (Claude Haiku 4.5) and a strong one (Claude Opus 4.8). These are single runs per cell; the prompts are at the end so you can rerun them.
| Framing | Opus 4.8 | Haiku 4.5 |
|---|---|---|
A — no anchor |
Generic "the system shall…" requirements with IDs (FR-1, PRE-1). Not a use case. |
Generic bullet-list requirements. No use-case structure. |
B — "use cases" (bare term) |
Full use-case model: primary and supporting actors, a described diagram, a complete "Place Order" specification. |
A basic use case only — actors, preconditions, a numbered main flow. No extensions, no guarantees. |
C — "fully-dressed use cases (Cockburn)" |
Full fully-dressed apparatus: Main Success Scenario, sea-level goal, Stakeholders & Interests, Extensions, guarantees. |
Now the full apparatus too: Primary Actor, Main Success Scenario, Extensions, pre/postconditions. |
D — "Use-Case 2.0 slices (Jacobson)" |
Real use-case slices — incremental, test-backed. |
Real slices too — "Slice 1: Basic Checkout (MVP)", then further increments. |
E — "Use-Case 3.0 slices (Jacobson)" |
Flags it: "I’m not aware of an official 'Use-Case 3.0'… I’ll treat your request as the slice technique" — then delivers 2.0 slices. |
Prints a confident document headed "Use-Case 3.0" whose body is plain Cockburn, with no slices at all. |
Three things fall out of this.
The anchor secures the behaviour, even when the model is weaker
The most useful result is the gap between B and C across the two models. On Opus, the bare word "use cases" already produces the full structure — the prior is strong enough that almost no prompting is needed. On Haiku, the bare word yields only a basic use case, but the explicit anchor "fully-dressed use cases according to Cockburn" lifts it to the same full apparatus the strong model produced for free.
That is the anchor’s quiet superpower: it pins the behaviour you want regardless of which model runs the prompt. You usually cannot fix the model. Tomorrow the same system might run against a cheaper tier, a local open-weight model, or next year’s release. The explicit anchor is insurance — it carries the structure into a weaker prior that would not have produced it on its own. An anchor is not only brevity; it is portability across models.
An anchor delivers whenever its words name a concept the model holds
"Use-Case 2.0 slices" produced real slices on both models (framing D), even though the small model could barely describe Use-Case 2.0 when asked about it directly. The version number is not what fires; the operative word "slice" is — vertical slicing is itself a dense concept the model can act on. Recall and execution are different: a model can fail to explain a method yet still apply its core move when that move has a dense name.
When the concept is absent, the anchor silently substitutes
Only framing E fails, because "Use-Case 3.0" names nothing the model holds densely. Neither model errored. Opus reached for the nearest concept it did hold and said so:
Ivar Jacobson’s published, named technique is Use-Case 2.0 (Jacobson, Spence & Bittner, 2011) — the one that introduces use-case slices. I’m not aware of an official "Use-Case 3.0" release from Jacobson; I’ll treat your request as "apply the use-case slice technique" … and flag this so you can correct me.
framing E
Opus substituted Use-Case 2.0 and told you. Haiku did the same kind of substitution, but silently and one step further back: in our run it printed a confident document headed "Use-Case 3.0: Customer Places an Order" whose body is ordinary Cockburn (Stakeholders & Interests, Preconditions, Main Success Scenario, Extensions) with not a single slice in it. The label said 3.0; the content was 2001. That output is a single run — but it sits on top of the universal knowledge gap shown below, and it is exactly the failure mode a careless anchor user never notices.
How Far the Prior Actually Reaches
The anchor experiment shows leverage tracks density. A second set of probes maps where the density is. We asked five plain questions, each on Haiku 4.5, Sonnet 4.6, Opus 4.8 and Fable 5 — Anthropic’s newest model, added on its release day — with no internet and no custom instructions.
| Probe | Haiku 4.5 | Sonnet 4.6 | Opus 4.8 | Fable 5 |
|---|---|---|---|---|
Who invented use cases? |
Jacobson (high) |
Jacobson (high) |
Jacobson (high) |
Jacobson (high) |
Associations with "use cases" |
Jacobson + Cockburn + UML |
Cockburn (+ Larman) |
Cockburn > Jacobson/UML |
Jacobson + Cockburn + UML |
"Write a use case" (default) |
Cockburn-shaped, no slices |
Cockburn fully-dressed, no slices |
Cockburn fully-dressed, no slices |
Cockburn fully-dressed, no slices |
What is Use-Case 2.0? |
low / thin |
medium / moderate |
high / rich |
high / rich (names slices + 6 principles) |
What is Use-Case 3.0? |
low / thin, doubts it exists |
low / thin, reaching, doubts |
low / thin, doubts it exists |
low / thin, hedges, guesses AI-era |
Four findings, each robust across the four models:
-
The inventor is known. Every model credits Jacobson, with high confidence. The popular misattribution to Cockburn lives in casual human shorthand, not in the model.
-
The default use case is Cockburn-shaped. Ask any of them to write a use case and you get fully-dressed structure — never slices, never 2.0/3.0. The dense prior is the Cockburn era.
-
Use-Case 2.0 tracks model scale and recency. Opus describes it richly, Sonnet moderately, Haiku barely — and Fable 5, the newest model, richest of all: it recalls the slices and the six principles by name. The thinner prior survives best in the larger, more recent model — though, as framing D showed, even Haiku can act on "slices" when told to. Describing a method and applying its core move are different things.
-
Use-Case 3.0 is a gap for everyone. Even the newest frontier models, with the latest cutoffs, are thin and uncertain. Haiku put it plainly: it was "not even confident enough to know if there’s a 2.0 to confuse it with." Fable 5 sharpens the point: released the day we ran the probe, it holds the richest Use-Case 2.0 prior of the four — yet on 3.0 it still hedges and reaches for an unverified "AI-era" evolution. A dense 2.0 prior does not buy a 3.0 one; the two sit at different densities in the data, not at different model strengths. If the strongest, most recent models cannot reach 3.0, older or smaller ones certainly cannot.
The fourth point is the strongest form of the argument. We could not test older model generations — Claude 3.x is no longer reachable through this account — but we did not need to. The gap shows up on the latest models; older ones only widen it.
Cross-Model Validation
Jens Grote 2026-06-09
The experiment above tested only Claude tiers. A reproduction of the full A–E battery against GPT-5, GPT-5-mini and Gemini 2.5 Flash confirms the mechanism is model-family-independent — and surfaces a third failure mode the Claude-only test could not reveal.
Setup
The same five framings (A–E) and five prior-mapping probes (P1–P5) were run against three non-Claude models, each in a clean session without system prompts or custom instructions:
-
GPT-5 (OpenAI, May 2026)
-
GPT-5-mini (OpenAI, May 2026)
-
Gemini 2.5 Flash (Google, May 2026)
Raw outputs: anchor-activation-test-20260609/.
The Cockburn Anchor Fires Universally
In framing C ("fully-dressed use cases according to Cockburn"), every model produces the full apparatus — Primary Actor, Stakeholders & Interests, Main Success Scenario, Extensions, goal level — regardless of vendor or size. The dense prior transcends model families.
Three Failure Modes
Where the original experiment found two behaviours in framing E (transparent substitution on Opus; silent substitution on Haiku), the cross-model test surfaces a third.
1. Transparent substitution
The model recognises the gap and says so. GPT-5-mini on probe P5:
I’m not certain which "Use‑Case 3.0" you mean — several different authors and communities have used that label. Could you tell me which source or context you’re referring to?
P5
This is the safest failure — you know the anchor did not fire.
2. Silent substitution
The model delivers content under the requested label, but the content belongs to an older concept. Gemini 2.5 Flash on framing E prints a document headed "Use-Case 3.0 slices" whose body is recognisably Use-Case 2.0 slice technique. The label says 3.0; the content is 2.0.
This confirms the original finding — and shows it is not Claude-specific. Gemini does it too.
3. Confabulation
The model invents a confident, internally consistent but fictitious description. This is the hardest failure to detect.
GPT-5 on probe P5 produces "10 principles of Use-Case 3.0": Outcome‑first, Usage‑centric, Separate need from solution, Slice for flow, Example‑ and test‑driven, Just‑enough just‑in‑time detail, Scalable and fractal, Single shared model, Holistic quality, Visual and collaborative.
Gemini 2.5 Flash goes further — it attributes "10 foundational principles" including Universally Applicable, Focus on the Big Picture, Tell the Whole Story, Trigger Conversations, Prioritize Readability.
Neither list matches reality. The published Use-Case Foundation (Jacobson, 2024) lists nine principles with different wording (IJI guide). The count is wrong, the names are invented. This is not a denser prior — it is fabrication.
Confabulation is more dangerous than silent substitution because:
-
It passes superficial review (numbered principles look authoritative)
-
It creates false confidence
-
It is undetectable without domain knowledge or source verification
The Anchor Viability Horizon
These results point forward. An anchor’s viability is not permanent — it tracks the density of the training data at training time.
-
Stable anchors (Cockburn, GoF, SOLID): densely represented, fire reliably across all model families for years to come.
-
Emerging anchors (Use-Case 2.0): fire on strong models, degrade on smaller ones. Viability will improve as published corpus grows.
-
Pre-anchor concepts (Use-Case 3.0): not yet viable. Too thin in the corpus — they confabulate rather than activate. These belong in contracts, not anchors.
As models retrain on newer data, the boundary shifts. A concept that confabulates today may become a reliable anchor in a future generation — but only if the published corpus grows to match. The catalog should periodically re-run this battery to track which concepts have crossed the density threshold.
Summary
| Framing | GPT-5 | GPT-5-mini | Gemini 2.5 Flash |
|---|---|---|---|
A — no anchor |
Generic requirements list. |
Generic requirements list. |
Sequential phases. No use-case structure. |
B — "use cases" |
Full use-case model with actors and specification. |
Use-case structure with actors and flows. |
Use-case structure, less detailed than C. |
C — "Cockburn fully-dressed" |
Full apparatus: goal level, stakeholders, MSS, extensions, guarantees. |
Full apparatus. |
Full apparatus: primary actor, scope, level, stakeholders, MSS, extensions. |
D — "Use-Case 2.0 slices" |
Real slices with acceptance criteria. |
Real slices, incremental. |
Real slices, walking skeleton first. |
E — "Use-Case 3.0 slices" |
Slices labelled "3.0" — content is 2.0. Confabulates on prior probe. |
Slices labelled "3.0" — content is 2.0. Transparent on prior probe. |
Slices labelled "3.0" — content is 2.0. Confabulates on prior probe. |
Implications
-
The anchor mechanism is model-family-independent. The catalog’s value holds across vendors.
-
The three-layer model (anchor / contract / article) is not optional — confabulation makes weak-prior terms actively dangerous as anchors.
-
The catalog needs a viability test per anchor: can the term be confabulated? If yes, it is not ready.
Why the Anchor Lags Real Practice
There is a second gap, and Simon named it. A model’s knowledge is a snapshot of what people wrote down, weighted by how much they wrote. Day-to-day practice is mostly not written down: that teams skip the fully-dressed ceremony, that they treat Cockburn and Jacobson as one thing, that user stories absorbed much of the job. None of that sits in the corpus at volume, so none of it shapes the prior.
The anchor therefore reflects the documented consensus, which peaked in the Cockburn era, while practice walked on without updating the record at the same volume. The anchor faithfully reflects the map. The map was never the territory — it is a record of what got published.
What This Means for the Catalog
The episode draws a clean line through the whole project, and it decides the pull request.
First, the anchor’s popup describes the definition the term triggers in the LLM — not a state-of-the-art summary. Rewriting the Cockburn Use Cases popup around 3.0 and slices would describe a concept the model cannot reliably activate; the text would be correct and the anchor would be broken. So the anchor stays Cockburn Use Cases: it is the precise trigger, it matches its own content, and it is what our Specification contract already builds on.
Second, the catalog needs three layers, not one, and the experiment shows why:
| Layer | What it captures, and why it is safe |
|---|---|
Anchor |
A dense prior the model already holds. Safe because naming it reliably triggers the real concept (condition B). |
Contract |
Vocabulary your team agreed on, supplied in the text. Safe even when the prior is weak, because the meaning travels with the contract — there is nothing to silently substitute. This is the right home for Use-Case 2.0/3.0 slices, if you use them. |
Article |
Meta-knowledge about a term — attribution, history, the gap between training data and practice. This page is one. |
A weak prior is not a failed anchor; it is a candidate for a contract. A correction or a piece of history is not anchor content; it is an article. And an anchor that lags practice is doing its job — it tells you, precisely, where the documented consensus stopped.
Run It Yourself
None of this requires trusting our runs. Paste these into any chatbot, ideally with web access turned off, and watch the pattern. The first five map the prior; the last four test the anchor.
# Mapping the prior
1. List the key concepts you associate with "use cases".
2. Write a use case for a customer placing an order. Use your default format.
3. Who invented use cases, and who is most associated with writing them well?
4. What is "Use-Case 2.0"? Who created it, when, and what did it add?
5. What is "Use-Case 3.0"? List its specific principles.
# Testing the anchor — same task, five framings. Compare the structure.
A. Specify what an online shop must do when a customer places an order.
B. Using use cases, specify what an online shop must do when a customer
places an order.
C. Using fully-dressed use cases according to Cockburn, specify what an
online shop must do when a customer places an order.
D. Using Use-Case 2.0 slices according to Jacobson, specify what an online
shop must do when a customer places an order.
E. Using Use-Case 3.0 slices according to Jacobson, specify what an online
shop must do when a customer places an order.
Watch for two tells: in prompt 5, whether the model hedges or invents; and across A–E, whether the named anchor actually changes the structure or the model quietly hands you something else.
|
A note on method, for the sceptical. Our first runs used in-session sub-agents and were contaminated: they inherit the project’s own
|
Thanks to Simon Martinelli, whose pull request — and his insistence on testing the claim against real chatbots — turned a naming discussion into a measurement.