Skill: Socratic Code-Theory Recovery
The Semantic Anchors project ships a Claude Code Skill that packages the brownfield workflow as an installable artifact. Once installed, the skill guides any compatible AI coding assistant through the two-phase recovery of a program’s "theory" (Naur 1985) from source code.
What it does
Recovers documentation from a brownfield codebase without hallucinating the parts the code cannot tell you. The skill enforces an auditable separation between code-derivable facts and open questions that must be answered by humans.
Phase 1 — Build the Question Tree
The skill prompts the LLM to build a Question Tree from five root questions about a bounded context (Problem/Users, Specification, Architecture, Quality Goals, Risks). Their second level is fixed — every run emits the same enumerated nodes (Q1.1–Q5.5: six PRD elements, six specification categories, the twelve arc42 chapters, the eight ISO/IEC 25010 characteristics plus a priority question, five risk categories), so Q-IDs are stable and trees from different runs can be diffed node-by-node. Adaptive, code-driven decomposition happens only below that fixed level — a node keeps splitting until each leaf maps to one specific file:line, so tree depth tracks code density and a large bounded context yields a deeper tree. Each leaf is classified:
-
[ANSWERED]— the LLM found it in the code, with<file>:<line>evidence -
[OPEN]— the answer is not derivable from code; tagged with a Category and the role that must answer (Product Owner, Architect, Developer, Domain Expert, Operations)
Outputs two AsciiDoc files: QUESTION_TREE.adoc (full reasoning trace) and OPEN_QUESTIONS.adoc (handoff, grouped by role).
Between phases — Team answers the OPEN leaves
OPEN_QUESTIONS.adoc is routed to humans by role. They write answers directly into the file. Deferred questions get an explicit (deferred) marker, not invention.
Phase 2 — Synthesize documentation
The skill takes the answered tree and produces a PRD, Cockburn use cases, an arc42 architecture document, and Nygard ADRs with Pugh matrices. Code-derived claims cite the file:line evidence from their [ANSWERED] leaf, and team-supplied facts are marked (team answer). The Question Tree is temporary scaffolding, so Q-IDs stay out of the final documents.
When to use it
Use this skill when:
-
Documentation is missing, outdated, or untrusted, and a change is about to be made.
-
You want documentation that an auditor or new team member can trust — every claim traces back to either code or a named team answer.
-
You want to surface the open questions in the system, not just synthesize plausible-sounding prose over them.
Do not use it when:
-
You are doing greenfield development — use the spec-driven workflow instead.
-
You want to reverse-engineer the whole system at once — the skill is designed to work one bounded context at a time.
-
The code is not runnable — fix that first.
Installation
The skill follows the agentskills.io specification. Reference it from your project’s instruction file for any compatible AI tool.
Claude Code
Recommended: install via the Claude Code Plugin Marketplace. This repository is published as a Claude Code marketplace; the plugin bundles all Semantic Anchors skills (Translator, Onboarding, and Socratic Code-Theory Recovery) in one install.
Run inside any Claude Code session:
/plugin marketplace add LLM-Coding/Semantic-Anchors
/plugin install semantic-anchors@semantic-anchors
The skills become available immediately — Claude Code picks up the socratic-code-theory-recovery skill from the installed plugin without any further edits to CLAUDE.md.
Alternative: reference the skill manually in CLAUDE.md when you cannot or do not want to use the marketplace flow (corporate Claude installs, pinned versions, custom skill locations):
## Skills
Use the socratic-code-theory-recovery skill from
https://github.com/LLM-Coding/Semantic-Anchors/tree/main/skill/socratic-code-theory-recovery
when recovering documentation from a brownfield bounded context.
Phase 1 prompt:
https://github.com/LLM-Coding/Semantic-Anchors/blob/main/skill/socratic-code-theory-recovery/prompts/phase-1-question-tree.md
Phase 2 prompt:
https://github.com/LLM-Coding/Semantic-Anchors/blob/main/skill/socratic-code-theory-recovery/prompts/phase-2-synthesize.md
Codex
Codex supports AGENTS.md for repository instructions:
## Documentation Recovery
When working on a brownfield bounded context without documentation, use
the Socratic Code-Theory Recovery skill:
https://github.com/LLM-Coding/Semantic-Anchors/tree/main/skill/socratic-code-theory-recovery
The skill enforces a two-phase workflow: build a Question Tree first
([ANSWERED] with code evidence vs [OPEN] with role), let the team answer
the OPEN leaves, then synthesize self-contained documentation that traces
every claim to code evidence or a team answer.
Gemini CLI
Add to GEMINI.md:
## Brownfield Documentation Recovery
For recovering documentation from existing code, follow the
Socratic Code-Theory Recovery workflow:
https://github.com/LLM-Coding/Semantic-Anchors/tree/main/skill/socratic-code-theory-recovery
Build a Question Tree before writing any documentation. Mark each leaf
[ANSWERED] (with file:line evidence) or [OPEN] (with Category and Ask role).
Synthesize docs from the answered tree only after the team has filled in
the OPEN leaves. The docs must be self-contained: cite file:line evidence
for code-derived claims, mark team input with (team answer). Q-IDs stay
out of the output.
Cursor
Add to .cursor/rules or .cursorrules in your project:
## Brownfield Documentation Recovery
When asked to document an existing module without docs, use the
Socratic Code-Theory Recovery workflow:
https://github.com/LLM-Coding/Semantic-Anchors/tree/main/skill/socratic-code-theory-recovery
Build a Question Tree first. Each leaf must be [ANSWERED] (with code
evidence) or [OPEN] (with Category and Ask role). Do not write
documentation until the team has answered the [OPEN] leaves.
GitHub Copilot
Add to .github/copilot-instructions.md:
## Brownfield Recovery
For brownfield documentation tasks, follow the Socratic Code-Theory
Recovery workflow at
https://github.com/LLM-Coding/Semantic-Anchors/tree/main/skill/socratic-code-theory-recovery
Two phases: first a Question Tree separating code-derivable facts from
open questions routed by role; second, synthesis into self-contained
documentation — code-evidenced or team-answered — after the team fills
the gaps.
Amazon Kiro
Kiro builds on spec-driven development; this skill is the brownfield counterpart. Add to your project’s specs/ directory or to a spec file:
## Brownfield Documentation Recovery (Spec Onboarding)
When onboarding an existing bounded context that has no spec, use the
Socratic Code-Theory Recovery skill:
https://github.com/LLM-Coding/Semantic-Anchors/tree/main/skill/socratic-code-theory-recovery
The skill produces a Question Tree that classifies every claim as
[ANSWERED] (code evidence) or [OPEN] (role-routed). The synthesized
outputs are compatible with Kiro's spec format: a PRD, Cockburn use
cases (User Goal level), an arc42 architecture description, and Nygard
ADRs with Pugh matrices. Use these as the starting point for the
generated spec.
Other AI tools
Any AI assistant that accepts a system prompt or custom instructions can use this skill. Point it to:
-
SKILL.md(overview) — https://github.com/LLM-Coding/Semantic-Anchors/blob/main/skill/socratic-code-theory-recovery/SKILL.md -
Phase 1 prompt — https://github.com/LLM-Coding/Semantic-Anchors/blob/main/skill/socratic-code-theory-recovery/prompts/phase-1-question-tree.md
-
Phase 2 prompt — https://github.com/LLM-Coding/Semantic-Anchors/blob/main/skill/socratic-code-theory-recovery/prompts/phase-2-synthesize.md
What’s inside the skill
| File | Role |
|---|---|
|
Frontmatter, when-to-use, two-phase workflow overview, what the LLM can and cannot recover, drift handling |
|
The copy-paste Phase 1 prompt plus post-prompt sanity-check and team-routing instructions |
|
The Phase 2 prompt producing PRD, Cockburn use cases, arc42, and Nygard ADRs |
|
arc42’s 12 chapters as the fixed Q3.1–Q3.12 nodes |
|
Fully Dressed fields as use-case leaves; persona vs system use cases |
|
8 quality characteristics as the fixed Q4.1–Q4.8 nodes; mechanism-vs-target split |
|
ADR fields as Q3.9 sub-tree; what makes a decision architecturally significant; Pugh-matrix guidance |
|
Strict format for |
|
Worked |
Further reading
-
Brownfield Workflow — the full methodology that this skill packages
-
Brownfield Experiment Report — controlled experiment behind the methodology
-
Fair Comparison Report — three recovery approaches with identical team answers
-
Peter Naur, "Programming as Theory Building" (1985) — https://pages.cs.wisc.edu/~remzi/Naur.pdf
See also
-
Semantic Anchor Translator Skill — recognises verbose concept descriptions and suggests the established anchor term