Socratic Code Theory Recovery
Can an LLM reverse-engineer software documentation from code?
A controlled experiment measuring what LLMs can and cannot recover from source code alone. We deleted all documentation from a well-documented project, asked an LLM to reconstruct it, and compared the output against the originals.
Key Findings
LLM recovers from code
Functional requirements (21 vs 7 in original), acceptance criteria (69 vs 40), building block views, glossary (31 vs 2 terms), security documentation. In some areas, the generated output was better than the original.
LLM cannot recover from code
Business context (why, against whom), design rationale (why alternative A over B), quality goal priorities, stakeholder concerns, aspirational features, performance budgets. Code is the result of decisions, not the decision itself.
11 questions close the gap
The two-phase workflow identifies exactly what the team needs to provide. In our experiment, 11 targeted questions (routed by role) were sufficient to produce documentation matching the original's ADR topics, quality goals, and performance budgets.
Semantic Anchors validated
Terms like "arc42", "Cockburn", "Nygard ADR" serve as both prompt compression (69 lines produce 3,850 lines of correct output) and decomposition heuristics ("arc42" generates 12 MECE sub-questions automatically).
Three Approaches Compared
| Approach | Score | Strength | Report |
|---|---|---|---|
| Direct | 17.5/30 | Most detailed functional requirements, inline threat model | Detailed report |
| Socratic | 18.5/30 | Only version with correct quality goal priorities, most efficient (21% of original) | Detailed report |
| Two-Phase | 22/30 | All 5 ADR topics correct, highest traceability (50 team-answer markers) | Detailed report |
Fair Comparison (all with team answers) Semantic Traceability Matrix
Reproduce the Experiment
All prompts are available in the prompts/ directory. Use them on the Bausteinsicht repo (branch brownfield) or on your own project.
| Prompt | Lines | Use when |
|---|---|---|
| 01-direct.md | 69 | Quick documentation from code alone |
| 02-socratic.md | 97 | Identifying knowledge gaps |
| 03-twophase-p1.md | 51 | Phase 1: Build Question Tree |
| 04-twophase-p2.md | 61 | Phase 2: Synthesize with team answers |
| 05-reconcile.md | 82 | Detect spec drift |
Built on Theory
Peter Naur argued that a program's "theory" — the mental model of how problem maps to solution — cannot be fully documented. This experiment tests that claim: for LLM-generated code, the theory can be externalized in structured documentation. And for legacy code, a recursive question tree can recover most of it.