Brownfield Experiment 1c: Socratic Code Theory Recovery
.1. Experiment Design
.1.1. Hypothesis
The Socratic Code Theory Recovery method inverts the direct approach (Experiment 1a), which generates documentation first and flags uncertainties as an afterthought. This leads to "confidence inflation" — the LLM writes plausible-sounding rationale instead of admitting ignorance. A question-first approach that recursively decomposes high-level questions until each leaf is either answered from code or marked as unanswerable should produce more honest Open Questions and a clearer separation of known vs. unknown.
.1.2. Setup
-
Project: Bausteinsicht (same as Experiment 1a)
-
Branch:
brownfield-socratic(same deletions as 1a: src/docs/, CLAUDE.md) -
Prompt:
brownfield-experiment-prompt-socratic.md(97 lines, recursive question refinement) -
LLM: Claude (fresh session)
-
Comparison against: Original documentation AND Experiment 1a (Direct approach) output
.1.3. Key Differences from Experiment 1a
| Dimension | Direct (1a) | Socratic (1c) |
|---|---|---|
Starting point |
"Generate these artifacts" |
"Answer these 5 questions" |
Process |
Template-driven, sequential |
Question-driven, recursive |
Primary output |
Documentation files |
Question Tree + synthesized docs |
Uncertainty handling |
Flag as afterthought (OQ list) |
Core mechanism (every leaf = answered or open) |
Depth control |
Fixed by template |
Emergent from decomposition |
.2. Results at a Glance
| Metric | Original | Direct (1a) | Socratic (1c) |
|---|---|---|---|
Total lines of docs |
~13,800 |
3,850 |
1,522 (+ 1,093 Question Tree) |
PRD: Functional Requirements |
7 FRs |
21 FRs |
9 FRs |
PRD: Non-Functional Requirements |
4 NFRs |
13 NFRs |
3 (embedded in narrative) |
Use Cases |
8 |
9 |
9 |
Acceptance Criteria |
40 |
69 |
~20 |
arc42 chapters (lines) |
1,300 |
752 |
429 |
ADRs |
5 |
6 |
3 |
Open Questions |
— |
33 |
26 |
Question Tree |
— |
— |
166 questions, 1,093 lines |
Glossary |
2 (placeholder) |
31 |
19 |
.3. The Question Tree
The Socratic approach produced a unique artifact: QUESTION_TREE.adoc with 166 questions across 3 levels of depth.
.3.1. Tree Structure
| Branch | Topic | Questions | Answered |
|---|---|---|---|
Q-1 |
Problem and Users |
~15 |
13 (87%) |
Q-2 |
Specification |
~38 |
36 (95%) |
Q-3 |
Architecture (arc42) |
~60 |
50 (83%) |
Q-4 |
Quality Goals (ISO 25010) |
~30 |
24 (80%) |
Q-5 |
Risks and Technical Debt |
~23 |
17 (74%) |
Total |
166 |
140 (84%) |
84% of questions were answered with code evidence. 16% remained open.
.3.2. Decomposition Quality
The tree uses Semantic Anchors as decomposition guides:
-
"What is the architecture?" decomposes along arc42 chapters (12 sub-questions)
-
"What is the specification?" decomposes into Cockburn Use Cases, CLI spec, data models
-
"What quality goals?" decomposes along ISO 25010 categories
This confirms that Semantic Anchors work not just as prompt compression (Experiment 1a finding) but also as decomposition heuristics. The terms carry enough structure to guide a MECE breakdown.
.3.3. MECE Assessment
Strong MECE: Q-2 (Specification) and Q-3 (Architecture) — no overlap, full coverage.
Weak MECE: Q-5 (Risks) — ad-hoc categories with overlaps (missing docs vs. missing schema, edge-case fragility vs. operational risks). A risk taxonomy like STRIDE would improve this branch.
Overlap: Q-3.10 (quality requirements in arc42) and Q-4 (quality goals per ISO 25010) cover similar ground from different angles. Not a defect per se, but the boundary is unclear.
.4. Open Questions: Direct vs. Socratic
| Dimension | Direct (1a) | Socratic (1c) |
|---|---|---|
Count |
33 |
26 |
With Category |
33 (100%) |
10 (38%) |
With Ask role |
17 (52%) |
7 (27%) |
Confidence scoring |
Yes (Low/Medium/High) |
No |
Code evidence for "unanswerable" |
Sometimes |
Consistently |
False opens (should be closed) |
2 (5%) |
0 (0%) |
.4.1. Where Socratic is more honest
The Direct approach produced 33 Open Questions, but 2 were false opens (already answerable from code) and several "Medium Confidence" items were really guesses dressed as analysis. The Socratic approach produced 26 Open Questions with zero false opens.
The key difference is structural: in the Socratic approach, the LLM must explicitly decide at each question "can I answer this?" before proceeding. In the Direct approach, the LLM writes documentation first and retrospectively considers what it might not know.
Example — ADR rationale:
-
Direct: Writes 6 ADRs with plausible rationale, then flags 8 Design Rationale questions as "Medium Confidence" Open Questions. The reader sees both the ADR and the doubt — confusing.
-
Socratic: Writes 3 ADRs, each marked with
[NOTE] This ADR was reverse-engineered from code (Q-3.9.2) — no authored ADR exists in the repository.Only 3 ADRs, but the reader knows exactly what is inferred vs. factual.
.4.2. Where Socratic falls short
The Socratic approach categorized only 38% of Open Questions (vs. 100% in Direct) and assigned Ask roles to only 27% (vs. 52%). This is a prompt issue — the template was followed less consistently during the synthesis phase. The Question Tree leaves have better metadata than the synthesized OPEN_QUESTIONS.adoc.
.5. Documentation Quality: Head-to-Head
.5.1. PRD
-
Direct wins on completeness: 21 FRs vs. 9, 13 NFRs vs. 3 embedded
-
Socratic wins on readability: More concise, business-focused framing
-
Both miss: Competitive positioning (Structurizr, LikeC4)
.5.2. Specification
-
Direct wins on traceability: Test function names cited inline (
// test: TestInitCreatesFiles) -
Socratic wins on reasoning: Question IDs trace claims to the decomposition step that produced them (
(Q-2.3.1)) -
Direct produces more ACs: 69 vs. ~20
-
Both produce 9 Use Cases with equivalent coverage
.5.3. arc42
| Ch. | Title | Direct | Socratic | Winner |
|---|---|---|---|---|
1 |
Introduction and Goals |
36 lines |
37 lines |
Tie |
2 |
Constraints |
22 lines |
21 lines |
Tie |
3 |
Context and Scope |
53 lines |
30 lines |
Direct (more detail) |
4 |
Solution Strategy |
59 lines |
20 lines |
Direct (significantly more) |
5 |
Building Block View |
137 lines |
51 lines |
Direct (diagrams + detail) |
6 |
Runtime View |
138 lines |
49 lines |
Direct (more scenarios) |
7 |
Deployment View |
51 lines |
35 lines |
Direct (slightly) |
8 |
Crosscutting Concepts |
95 lines |
64 lines |
Direct (more topics) |
9 |
Architecture Decisions |
10 lines |
11 lines |
Socratic (honest about missing ADRs) |
10 |
Quality Requirements |
21 lines |
36 lines |
Socratic (Q-references, more complete) |
11 |
Risks and Technical Debt |
39 lines |
56 lines |
Socratic (more risks, Q-structure) |
12 |
Glossary |
31 lines |
19 lines |
Direct (more terms) |
Direct wins 7 chapters on content volume and detail. Socratic wins 3 chapters (9, 10, 11) — the chapters where honesty about unknowns matters most.
.5.4. ADRs
| Dimension | Direct (1a) | Socratic (1c) |
|---|---|---|
Count |
6 |
3 |
Topics |
DSL Format, CLI Framework, Sync Strategy, Conflict Policy, XML Library, Embedded Templates |
DSL Format, Conflict Resolution, Pure Sync Function |
Pugh Matrix |
Yes (all 6) |
Yes (all 3) |
Transparency about inference |
None |
|
The Direct approach produces more ADRs (6 vs. 3) but presents them without caveats. The Socratic approach produces fewer but explicitly marks each as "reverse-engineered from code." For an architect reading these, the Socratic ADRs are safer: you know what you’re getting.
.6. Key Finding: Questions as Documentation Structure
The Question Tree creates a machine-readable knowledge taxonomy. Each Q-ID can be:
-
Linked from arc42 chapters (the Socratic version does this throughout)
-
Converted to issues in a tracker
-
Used as a checklist for onboarding ("answer Q-1.1.2 before starting work")
-
Versioned and diffed between runs
This is something the Direct approach cannot do. The Direct approach produces documentation; the Socratic approach produces documentation and a reasoning trace.
.7. When to Use Which Approach
| Scenario | Recommended | Why |
|---|---|---|
Rapid brownfield documentation |
Direct |
More comprehensive output, professional format |
Identifying knowledge gaps before onboarding |
Socratic |
Question Tree reveals what’s unknown |
Compliance / audit trail |
Socratic |
Explicit about what’s inferred vs. factual |
Stakeholder communication |
Direct |
Reads like standard arc42 |
Preparing a Brownfield project for the Dark Factory |
Socratic first, then Direct |
Questions first, then fill in documentation |
Spec reconciliation (drift detection) |
Direct |
Needs comprehensive coverage to diff against existing spec |
.8. Recommendation: Two-Phase Brownfield Workflow
The experiments suggest a combined approach:
-
Phase 1 — Socratic: Run the question-driven prompt to build the Question Tree. This identifies what is knowable from code and what requires human input. Hand the Open Questions to the team.
-
Phase 2 — Human input: The team answers the Open Questions (routed by the Ask role). Business context, design rationale, quality goal priorities.
-
Phase 3 — Direct: Run the template-driven prompt with the answered questions as additional context. This produces comprehensive documentation with the rationale gaps filled.
This combines the Socratic approach’s honesty with the Direct approach’s completeness.
.9. Implications for Semantic Anchors
The Socratic experiment adds a third validated use of Semantic Anchors:
-
Prompt compression (Experiment 1a): "arc42" triggers 12 chapters without definition.
-
Decomposition heuristic (Experiment 1c): "arc42" guides MECE question decomposition.
-
Quality bar (both): The anchor defines not just structure but expected rigor (Cockburn = actors + flows + postconditions, not just "list the features").
The decomposition use is particularly powerful. When the LLM encounters "What is the architecture?", the anchor "arc42" immediately provides 12 sub-questions. Without the anchor, the LLM would decompose ad-hoc, likely missing chapters like Deployment View or Glossary.
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.