Socratic Code Theory Recovery

Can an LLM reverse-engineer software documentation from code?

A controlled experiment measuring what LLMs can and cannot recover from source code alone. We deleted all documentation from a well-documented project, asked an LLM to reconstruct it, and compared the output against the originals.

Read the Report GitHub

Key Findings

LLM recovers from code

Functional requirements (21 vs 7 in original), acceptance criteria (69 vs 40), building block views, glossary (31 vs 2 terms), security documentation. In some areas, the generated output was better than the original.

LLM cannot recover from code

Business context (why, against whom), design rationale (why alternative A over B), quality goal priorities, stakeholder concerns, aspirational features, performance budgets. Code is the result of decisions, not the decision itself.

11 questions close the gap

The two-phase workflow identifies exactly what the team needs to provide. In our experiment, 11 targeted questions (routed by role) were sufficient to produce documentation matching the original's ADR topics, quality goals, and performance budgets.

Semantic Anchors validated

Terms like "arc42", "Cockburn", "Nygard ADR" serve as both prompt compression (69 lines produce 3,850 lines of correct output) and decomposition heuristics ("arc42" generates 12 MECE sub-questions automatically).

Three Approaches Compared

Approach	Score	Strength	Report
Direct	17.5/30	Most detailed functional requirements, inline threat model	Detailed report
Socratic	18.5/30	Only version with correct quality goal priorities, most efficient (21% of original)	Detailed report
Two-Phase	22/30	All 5 ADR topics correct, highest traceability (50 team-answer markers)	Detailed report

Fair Comparison (all with team answers) Semantic Traceability Matrix

Reproduce the Experiment

All prompts are available in the prompts/ directory. Use them on the Bausteinsicht repo (branch brownfield) or on your own project.

Prompt	Lines	Use when
01-direct.md	69	Quick documentation from code alone
02-socratic.md	97	Identifying knowledge gaps
03-twophase-p1.md	51	Phase 1: Build Question Tree
04-twophase-p2.md	61	Phase 2: Synthesize with team answers
05-reconcile.md	82	Detect spec drift

Built on Theory

Peter Naur argued that a program's "theory" — the mental model of how problem maps to solution — cannot be fully documented. This experiment tests that claim: for LLM-generated code, the theory can be externalized in structured documentation. And for legacy code, a recursive question tree can recover most of it.

Brownfield Workflow Spec-Driven Development