Brownfield Experiment 1c: Socratic Code Theory Recovery

.1. Experiment Design

.1.1. Hypothesis

The Socratic Code Theory Recovery method inverts the direct approach (Experiment 1a), which generates documentation first and flags uncertainties as an afterthought. This leads to "confidence inflation" — the LLM writes plausible-sounding rationale instead of admitting ignorance. A question-first approach that recursively decomposes high-level questions until each leaf is either answered from code or marked as unanswerable should produce more honest Open Questions and a clearer separation of known vs. unknown.

.1.2. Setup

Project: Bausteinsicht (same as Experiment 1a)
Branch: brownfield-socratic (same deletions as 1a: src/docs/, CLAUDE.md)
Prompt: brownfield-experiment-prompt-socratic.md (97 lines, recursive question refinement)
LLM: Claude (fresh session)
Comparison against: Original documentation AND Experiment 1a (Direct approach) output

.1.3. Key Differences from Experiment 1a

Dimension	Direct (1a)	Socratic (1c)
Starting point	"Generate these artifacts"	"Answer these 5 questions"
Process	Template-driven, sequential	Question-driven, recursive
Primary output	Documentation files	Question Tree + synthesized docs
Uncertainty handling	Flag as afterthought (OQ list)	Core mechanism (every leaf = answered or open)
Depth control	Fixed by template	Emergent from decomposition

Dimension

Direct (1a)

Socratic (1c)

Starting point

"Generate these artifacts"

"Answer these 5 questions"

Process

Template-driven, sequential

Question-driven, recursive

Primary output

Documentation files

Question Tree + synthesized docs

Uncertainty handling

Flag as afterthought (OQ list)

Core mechanism (every leaf = answered or open)

Depth control

Fixed by template

Emergent from decomposition

.2. Results at a Glance

Metric	Original	Direct (1a)	Socratic (1c)
Total lines of docs	~13,800	3,850	1,522 (+ 1,093 Question Tree)
PRD: Functional Requirements	7 FRs	21 FRs	9 FRs
PRD: Non-Functional Requirements	4 NFRs	13 NFRs	3 (embedded in narrative)
Use Cases	8	9	9
Acceptance Criteria	40	69	~20
arc42 chapters (lines)	1,300	752	429
ADRs	5	6	3
Open Questions	—	33	26
Question Tree	—	—	166 questions, 1,093 lines
Glossary	2 (placeholder)	31	19

.3. The Question Tree

The Socratic approach produced a unique artifact: QUESTION_TREE.adoc with 166 questions across 3 levels of depth.

.3.1. Tree Structure

Branch	Topic	Questions	Answered
Q-1	Problem and Users	~15	13 (87%)
Q-2	Specification	~38	36 (95%)
Q-3	Architecture (arc42)	~60	50 (83%)
Q-4	Quality Goals (ISO 25010)	~30	24 (80%)
Q-5	Risks and Technical Debt	~23	17 (74%)
Total		166	140 (84%)

Branch

Topic

Questions

Answered

Q-1

Problem and Users

~15

13 (87%)

Q-2

Specification

~38

36 (95%)

Q-3

Architecture (arc42)

~60

50 (83%)

Q-4

Quality Goals (ISO 25010)

~30

24 (80%)

Q-5

Risks and Technical Debt

~23

17 (74%)

Total

166

140 (84%)

84% of questions were answered with code evidence. 16% remained open.

.3.2. Decomposition Quality

The tree uses Semantic Anchors as decomposition guides:

"What is the architecture?" decomposes along arc42 chapters (12 sub-questions)
"What is the specification?" decomposes into Cockburn Use Cases, CLI spec, data models
"What quality goals?" decomposes along ISO 25010 categories

This confirms that Semantic Anchors work not just as prompt compression (Experiment 1a finding) but also as decomposition heuristics. The terms carry enough structure to guide a MECE breakdown.

.3.3. MECE Assessment

Strong MECE: Q-2 (Specification) and Q-3 (Architecture) — no overlap, full coverage.

Weak MECE: Q-5 (Risks) — ad-hoc categories with overlaps (missing docs vs. missing schema, edge-case fragility vs. operational risks). A risk taxonomy like STRIDE would improve this branch.

Overlap: Q-3.10 (quality requirements in arc42) and Q-4 (quality goals per ISO 25010) cover similar ground from different angles. Not a defect per se, but the boundary is unclear.

.4. Open Questions: Direct vs. Socratic

Dimension	Direct (1a)	Socratic (1c)
Count	33	26
With Category	33 (100%)	10 (38%)
With Ask role	17 (52%)	7 (27%)
Confidence scoring	Yes (Low/Medium/High)	No
Code evidence for "unanswerable"	Sometimes	Consistently
False opens (should be closed)	2 (5%)	0 (0%)

.4.1. Where Socratic is more honest

The Direct approach produced 33 Open Questions, but 2 were false opens (already answerable from code) and several "Medium Confidence" items were really guesses dressed as analysis. The Socratic approach produced 26 Open Questions with zero false opens.

The key difference is structural: in the Socratic approach, the LLM must explicitly decide at each question "can I answer this?" before proceeding. In the Direct approach, the LLM writes documentation first and retrospectively considers what it might not know.

Example — ADR rationale:

Direct: Writes 6 ADRs with plausible rationale, then flags 8 Design Rationale questions as "Medium Confidence" Open Questions. The reader sees both the ADR and the doubt — confusing.
Socratic: Writes 3 ADRs, each marked with [NOTE] This ADR was reverse-engineered from code (Q-3.9.2) — no authored ADR exists in the repository. Only 3 ADRs, but the reader knows exactly what is inferred vs. factual.

.4.2. Where Socratic falls short

The Socratic approach categorized only 38% of Open Questions (vs. 100% in Direct) and assigned Ask roles to only 27% (vs. 52%). This is a prompt issue — the template was followed less consistently during the synthesis phase. The Question Tree leaves have better metadata than the synthesized OPEN_QUESTIONS.adoc.

.5. Documentation Quality: Head-to-Head

.5.1. PRD

Direct wins on completeness: 21 FRs vs. 9, 13 NFRs vs. 3 embedded
Socratic wins on readability: More concise, business-focused framing
Both miss: Competitive positioning (Structurizr, LikeC4)

.5.2. Specification

Direct wins on traceability: Test function names cited inline (// test: TestInitCreatesFiles)
Socratic wins on reasoning: Question IDs trace claims to the decomposition step that produced them ((Q-2.3.1))
Direct produces more ACs: 69 vs. ~20
Both produce 9 Use Cases with equivalent coverage

.5.3. arc42

Ch.	Title	Direct	Socratic	Winner
1	Introduction and Goals	36 lines	37 lines	Tie
2	Constraints	22 lines	21 lines	Tie
3	Context and Scope	53 lines	30 lines	Direct (more detail)
4	Solution Strategy	59 lines	20 lines	Direct (significantly more)
5	Building Block View	137 lines	51 lines	Direct (diagrams + detail)
6	Runtime View	138 lines	49 lines	Direct (more scenarios)
7	Deployment View	51 lines	35 lines	Direct (slightly)
8	Crosscutting Concepts	95 lines	64 lines	Direct (more topics)
9	Architecture Decisions	10 lines	11 lines	Socratic (honest about missing ADRs)
10	Quality Requirements	21 lines	36 lines	Socratic (Q-references, more complete)
11	Risks and Technical Debt	39 lines	56 lines	Socratic (more risks, Q-structure)
12	Glossary	31 lines	19 lines	Direct (more terms)

Ch.

Title

Direct

Socratic

Winner

Introduction and Goals

36 lines

37 lines

Tie

Constraints

22 lines

21 lines

Tie

Context and Scope

53 lines

30 lines

Direct (more detail)

Solution Strategy

59 lines

20 lines

Direct (significantly more)

Building Block View

137 lines

51 lines

Direct (diagrams + detail)

Runtime View

138 lines

49 lines

Direct (more scenarios)

Deployment View

51 lines

35 lines

Direct (slightly)

Crosscutting Concepts

95 lines

64 lines

Direct (more topics)

Architecture Decisions

10 lines

11 lines

Socratic (honest about missing ADRs)

Quality Requirements

21 lines

36 lines

Socratic (Q-references, more complete)

Risks and Technical Debt

39 lines

56 lines

Socratic (more risks, Q-structure)

Glossary

31 lines

19 lines

Direct (more terms)

Direct wins 7 chapters on content volume and detail. Socratic wins 3 chapters (9, 10, 11) — the chapters where honesty about unknowns matters most.

.5.4. ADRs

Dimension Direct (1a) Socratic (1c)

Dimension	Direct (1a)	Socratic (1c)
Count	6	3
Topics	DSL Format, CLI Framework, Sync Strategy, Conflict Policy, XML Library, Embedded Templates	DSL Format, Conflict Resolution, Pure Sync Function
Pugh Matrix	Yes (all 6)	Yes (all 3)
Transparency about inference	None	`[NOTE]` blocks on every ADR

Count

Topics

DSL Format, CLI Framework, Sync Strategy, Conflict Policy, XML Library, Embedded Templates

DSL Format, Conflict Resolution, Pure Sync Function

Pugh Matrix

Yes (all 6)

Yes (all 3)

Transparency about inference

None

[NOTE] blocks on every ADR

The Direct approach produces more ADRs (6 vs. 3) but presents them without caveats. The Socratic approach produces fewer but explicitly marks each as "reverse-engineered from code." For an architect reading these, the Socratic ADRs are safer: you know what you’re getting.

.6. Key Finding: Questions as Documentation Structure

The Question Tree creates a machine-readable knowledge taxonomy. Each Q-ID can be:

Linked from arc42 chapters (the Socratic version does this throughout)
Converted to issues in a tracker
Used as a checklist for onboarding ("answer Q-1.1.2 before starting work")
Versioned and diffed between runs

This is something the Direct approach cannot do. The Direct approach produces documentation; the Socratic approach produces documentation and a reasoning trace.

.7. When to Use Which Approach

Scenario	Recommended	Why
Rapid brownfield documentation	Direct	More comprehensive output, professional format
Identifying knowledge gaps before onboarding	Socratic	Question Tree reveals what’s unknown
Compliance / audit trail	Socratic	Explicit about what’s inferred vs. factual
Stakeholder communication	Direct	Reads like standard arc42
Preparing a Brownfield project for the Dark Factory	Socratic first, then Direct	Questions first, then fill in documentation
Spec reconciliation (drift detection)	Direct	Needs comprehensive coverage to diff against existing spec

Scenario

Recommended

Why

Rapid brownfield documentation

Direct

More comprehensive output, professional format

Identifying knowledge gaps before onboarding

Socratic

Question Tree reveals what’s unknown

Compliance / audit trail

Socratic

Explicit about what’s inferred vs. factual

Stakeholder communication

Direct

Reads like standard arc42

Preparing a Brownfield project for the Dark Factory

Socratic first, then Direct

Questions first, then fill in documentation

Spec reconciliation (drift detection)

Direct

Needs comprehensive coverage to diff against existing spec

.8. Recommendation: Two-Phase Brownfield Workflow

The experiments suggest a combined approach:

Phase 1 — Socratic: Run the question-driven prompt to build the Question Tree. This identifies what is knowable from code and what requires human input. Hand the Open Questions to the team.
Phase 2 — Human input: The team answers the Open Questions (routed by the Ask role). Business context, design rationale, quality goal priorities.
Phase 3 — Direct: Run the template-driven prompt with the answered questions as additional context. This produces comprehensive documentation with the rationale gaps filled.

This combines the Socratic approach’s honesty with the Direct approach’s completeness.

.9. Implications for Semantic Anchors

The Socratic experiment adds a third validated use of Semantic Anchors:

Prompt compression (Experiment 1a): "arc42" triggers 12 chapters without definition.
Decomposition heuristic (Experiment 1c): "arc42" guides MECE question decomposition.
Quality bar (both): The anchor defines not just structure but expected rigor (Cockburn = actors + flows + postconditions, not just "list the features").

The decomposition use is particularly powerful. When the LLM encounters "What is the architecture?", the anchor "arc42" immediately provides 12 sub-questions. Without the anchor, the LLM would decompose ad-hoc, likely missing chapters like Deployment View or Glossary.

Feedback

Was this page helpful?

Glad to hear it! Please tell us how we can improve.

Sorry to hear that. Please tell us how we can improve.