Brownfield Experiment 1c: Socratic Code Theory Recovery

.1. Experiment Design

.1.1. Hypothesis

The Socratic Code Theory Recovery method inverts the direct approach (Experiment 1a), which generates documentation first and flags uncertainties as an afterthought. This leads to "confidence inflation" — the LLM writes plausible-sounding rationale instead of admitting ignorance. A question-first approach that recursively decomposes high-level questions until each leaf is either answered from code or marked as unanswerable should produce more honest Open Questions and a clearer separation of known vs. unknown.

.1.2. Setup

  • Project: Bausteinsicht (same as Experiment 1a)

  • Branch: brownfield-socratic (same deletions as 1a: src/docs/, CLAUDE.md)

  • Prompt: brownfield-experiment-prompt-socratic.md (97 lines, recursive question refinement)

  • LLM: Claude (fresh session)

  • Comparison against: Original documentation AND Experiment 1a (Direct approach) output

.1.3. Key Differences from Experiment 1a

Dimension Direct (1a) Socratic (1c)

Starting point

"Generate these artifacts"

"Answer these 5 questions"

Process

Template-driven, sequential

Question-driven, recursive

Primary output

Documentation files

Question Tree + synthesized docs

Uncertainty handling

Flag as afterthought (OQ list)

Core mechanism (every leaf = answered or open)

Depth control

Fixed by template

Emergent from decomposition

.2. Results at a Glance

Metric Original Direct (1a) Socratic (1c)

Total lines of docs

~13,800

3,850

1,522 (+ 1,093 Question Tree)

PRD: Functional Requirements

7 FRs

21 FRs

9 FRs

PRD: Non-Functional Requirements

4 NFRs

13 NFRs

3 (embedded in narrative)

Use Cases

8

9

9

Acceptance Criteria

40

69

~20

arc42 chapters (lines)

1,300

752

429

ADRs

5

6

3

Open Questions

33

26

Question Tree

166 questions, 1,093 lines

Glossary

2 (placeholder)

31

19

.3. The Question Tree

The Socratic approach produced a unique artifact: QUESTION_TREE.adoc with 166 questions across 3 levels of depth.

.3.1. Tree Structure

Branch Topic Questions Answered

Q-1

Problem and Users

~15

13 (87%)

Q-2

Specification

~38

36 (95%)

Q-3

Architecture (arc42)

~60

50 (83%)

Q-4

Quality Goals (ISO 25010)

~30

24 (80%)

Q-5

Risks and Technical Debt

~23

17 (74%)

Total

166

140 (84%)

84% of questions were answered with code evidence. 16% remained open.

.3.2. Decomposition Quality

The tree uses Semantic Anchors as decomposition guides:

  • "What is the architecture?" decomposes along arc42 chapters (12 sub-questions)

  • "What is the specification?" decomposes into Cockburn Use Cases, CLI spec, data models

  • "What quality goals?" decomposes along ISO 25010 categories

This confirms that Semantic Anchors work not just as prompt compression (Experiment 1a finding) but also as decomposition heuristics. The terms carry enough structure to guide a MECE breakdown.

.3.3. MECE Assessment

Strong MECE: Q-2 (Specification) and Q-3 (Architecture) — no overlap, full coverage.

Weak MECE: Q-5 (Risks) — ad-hoc categories with overlaps (missing docs vs. missing schema, edge-case fragility vs. operational risks). A risk taxonomy like STRIDE would improve this branch.

Overlap: Q-3.10 (quality requirements in arc42) and Q-4 (quality goals per ISO 25010) cover similar ground from different angles. Not a defect per se, but the boundary is unclear.

.4. Open Questions: Direct vs. Socratic

Dimension Direct (1a) Socratic (1c)

Count

33

26

With Category

33 (100%)

10 (38%)

With Ask role

17 (52%)

7 (27%)

Confidence scoring

Yes (Low/Medium/High)

No

Code evidence for "unanswerable"

Sometimes

Consistently

False opens (should be closed)

2 (5%)

0 (0%)

.4.1. Where Socratic is more honest

The Direct approach produced 33 Open Questions, but 2 were false opens (already answerable from code) and several "Medium Confidence" items were really guesses dressed as analysis. The Socratic approach produced 26 Open Questions with zero false opens.

The key difference is structural: in the Socratic approach, the LLM must explicitly decide at each question "can I answer this?" before proceeding. In the Direct approach, the LLM writes documentation first and retrospectively considers what it might not know.

Example — ADR rationale:

  • Direct: Writes 6 ADRs with plausible rationale, then flags 8 Design Rationale questions as "Medium Confidence" Open Questions. The reader sees both the ADR and the doubt — confusing.

  • Socratic: Writes 3 ADRs, each marked with [NOTE] This ADR was reverse-engineered from code (Q-3.9.2) — no authored ADR exists in the repository. Only 3 ADRs, but the reader knows exactly what is inferred vs. factual.

.4.2. Where Socratic falls short

The Socratic approach categorized only 38% of Open Questions (vs. 100% in Direct) and assigned Ask roles to only 27% (vs. 52%). This is a prompt issue — the template was followed less consistently during the synthesis phase. The Question Tree leaves have better metadata than the synthesized OPEN_QUESTIONS.adoc.

.5. Documentation Quality: Head-to-Head

.5.1. PRD

  • Direct wins on completeness: 21 FRs vs. 9, 13 NFRs vs. 3 embedded

  • Socratic wins on readability: More concise, business-focused framing

  • Both miss: Competitive positioning (Structurizr, LikeC4)

.5.2. Specification

  • Direct wins on traceability: Test function names cited inline (// test: TestInitCreatesFiles)

  • Socratic wins on reasoning: Question IDs trace claims to the decomposition step that produced them ((Q-2.3.1))

  • Direct produces more ACs: 69 vs. ~20

  • Both produce 9 Use Cases with equivalent coverage

.5.3. arc42

Ch. Title Direct Socratic Winner

1

Introduction and Goals

36 lines

37 lines

Tie

2

Constraints

22 lines

21 lines

Tie

3

Context and Scope

53 lines

30 lines

Direct (more detail)

4

Solution Strategy

59 lines

20 lines

Direct (significantly more)

5

Building Block View

137 lines

51 lines

Direct (diagrams + detail)

6

Runtime View

138 lines

49 lines

Direct (more scenarios)

7

Deployment View

51 lines

35 lines

Direct (slightly)

8

Crosscutting Concepts

95 lines

64 lines

Direct (more topics)

9

Architecture Decisions

10 lines

11 lines

Socratic (honest about missing ADRs)

10

Quality Requirements

21 lines

36 lines

Socratic (Q-references, more complete)

11

Risks and Technical Debt

39 lines

56 lines

Socratic (more risks, Q-structure)

12

Glossary

31 lines

19 lines

Direct (more terms)

Direct wins 7 chapters on content volume and detail. Socratic wins 3 chapters (9, 10, 11) — the chapters where honesty about unknowns matters most.

.5.4. ADRs

Dimension Direct (1a) Socratic (1c)

Count

6

3

Topics

DSL Format, CLI Framework, Sync Strategy, Conflict Policy, XML Library, Embedded Templates

DSL Format, Conflict Resolution, Pure Sync Function

Pugh Matrix

Yes (all 6)

Yes (all 3)

Transparency about inference

None

[NOTE] blocks on every ADR

The Direct approach produces more ADRs (6 vs. 3) but presents them without caveats. The Socratic approach produces fewer but explicitly marks each as "reverse-engineered from code." For an architect reading these, the Socratic ADRs are safer: you know what you’re getting.

.6. Key Finding: Questions as Documentation Structure

The Question Tree creates a machine-readable knowledge taxonomy. Each Q-ID can be:

  • Linked from arc42 chapters (the Socratic version does this throughout)

  • Converted to issues in a tracker

  • Used as a checklist for onboarding ("answer Q-1.1.2 before starting work")

  • Versioned and diffed between runs

This is something the Direct approach cannot do. The Direct approach produces documentation; the Socratic approach produces documentation and a reasoning trace.

.7. When to Use Which Approach

Scenario Recommended Why

Rapid brownfield documentation

Direct

More comprehensive output, professional format

Identifying knowledge gaps before onboarding

Socratic

Question Tree reveals what’s unknown

Compliance / audit trail

Socratic

Explicit about what’s inferred vs. factual

Stakeholder communication

Direct

Reads like standard arc42

Preparing a Brownfield project for the Dark Factory

Socratic first, then Direct

Questions first, then fill in documentation

Spec reconciliation (drift detection)

Direct

Needs comprehensive coverage to diff against existing spec

.8. Recommendation: Two-Phase Brownfield Workflow

The experiments suggest a combined approach:

  1. Phase 1 — Socratic: Run the question-driven prompt to build the Question Tree. This identifies what is knowable from code and what requires human input. Hand the Open Questions to the team.

  2. Phase 2 — Human input: The team answers the Open Questions (routed by the Ask role). Business context, design rationale, quality goal priorities.

  3. Phase 3 — Direct: Run the template-driven prompt with the answered questions as additional context. This produces comprehensive documentation with the rationale gaps filled.

This combines the Socratic approach’s honesty with the Direct approach’s completeness.

.9. Implications for Semantic Anchors

The Socratic experiment adds a third validated use of Semantic Anchors:

  1. Prompt compression (Experiment 1a): "arc42" triggers 12 chapters without definition.

  2. Decomposition heuristic (Experiment 1c): "arc42" guides MECE question decomposition.

  3. Quality bar (both): The anchor defines not just structure but expected rigor (Cockburn = actors + flows + postconditions, not just "list the features").

The decomposition use is particularly powerful. When the LLM encounters "What is the architecture?", the anchor "arc42" immediately provides 12 sub-questions. Without the anchor, the LLM would decompose ad-hoc, likely missing chapters like Deployment View or Glossary.