Adapting the Workflow to Brownfield Projects

Introduction

You have mastered the greenfield workflow. Now you want to apply it to an existing codebase.

The core principles remain the same: small steps, high autonomy, error correction loops. But brownfield projects add a challenge that greenfield projects do not have: the system already exists. You cannot start with a blank slate. You must understand what is there before you change it.

This document describes how to onboard an existing codebase into the spec-driven workflow. The key insight comes from Simon Martinelli’s AI Unified Process: you do not need to specify the entire system. You work one bounded context at a time. Spec coverage grows incrementally, feature by feature.

The Brownfield Paradox

In greenfield projects, you write the spec first and the code follows. In brownfield projects, the code already exists — but the spec often does not. The system is the specification, except nobody can read it.

The temptation is to reverse-engineer the entire system into documentation before changing anything. Do not do that. That is Big Upfront Documentation, and it fails for the same reasons as Big Upfront Design.

Instead: specify only the bounded context you are about to change, and only as much as you need to change it safely.

Phase 0: Scope a Bounded Context

Before touching code, identify the area you want to change.

A bounded context is a coherent slice of the system with clear boundaries. It might be a module, a service, a feature area, or a screen. The boundaries should be small enough that you can understand the context in a single session.

Use ⚓ Domain-Driven Design to identify the context boundary. The AI can help: point it at the code and ask it to identify bounded contexts and their interfaces.

Prompt
Analyze the codebase in src/. Identify bounded contexts using Domain-Driven Design.
For each context, list: name, responsibility, key entities, interfaces to other contexts.
Present as a table.

Pick one bounded context to start with. Choose one that is small, well-isolated, and has a change request pending.

Phase 0.5: Socratic Code-Theory Recovery

Before changing anything, you need to recover the "theory" of the bounded context — what Peter Naur called the mental model that lives in the heads of the original developers. In a brownfield project, this model is not documented. The code is the only source.

This phase uses Socratic Code-Theory Recovery: a two-phase workflow that builds understanding through recursive question refinement before producing documentation.

The prompts in this phase are also packaged as an installable Claude Code Skill. See the Socratic Code-Theory Recovery Skill page for installation instructions across Claude Code, Codex, Cursor, GitHub Copilot, Gemini CLI, and Amazon Kiro.

Phase 1: Build the Question Tree

Start with five root questions about the bounded context. Their second level is fixed: every run emits the same prescribed set of sub-questions — the six PRD elements (Q1.1–Q1.6), six specification categories (Q2.1–Q2.6), the twelve arc42 chapters (Q3.1–Q3.12), the eight ISO/IEC 25010 characteristics plus a priority question (Q4.1–Q4.9), and five risk categories (Q5.1–Q5.5). Free, code-driven decomposition applies only below that fixed level. The fixed level keeps Q-IDs stable — Q3.7 is always the Deployment View — so trees from different runs can be diffed node-by-node.

Each leaf in the tree is either [ANSWERED] (with code evidence: file, function, line) or [OPEN] (with Category and Ask role). The output is two AsciiDoc files: QUESTION_TREE.adoc (full reasoning trace) and OPEN_QUESTIONS.adoc (handoff document, grouped by role).

Copy the prompt below into a session that has read access to the bounded context. Adapt [bounded context path] and the example questions if your domain warrants it; do not change the leaf classification, Q-ID scheme, or output files.

Prompt: Phase 1 — Build the Question Tree
You are performing Socratic Code-Theory Recovery on a brownfield bounded
context located at [bounded context path]. Phase 1 of two.

Goal: recover the program's theory (Naur, 1985) from source code through
recursive question refinement, before any documentation is written.

Process:

1. Start with five root questions about the bounded context:
   Q1 What problem does this bounded context solve, and for whom?
   Q2 What is the specification of this bounded context?
   Q3 What is the architecture of this bounded context?
   Q4 What quality goals drive the design?
   Q5 What risks and technical debt exist?

2. The second level of the tree is FIXED, not free. Emit exactly these
   nodes, in this order, even when a node's only leaf is [OPEN] or
   [ANSWERED: not applicable]:
     Q1.1-Q1.6  product identity, primary users, channels, why-built,
                success metrics, segment priority
     Q2.1-Q2.6  actors, use-case catalog, per-interface system specs,
                data/entity model, acceptance criteria, cross-cutting
                business rules
     Q3.1-Q3.12 the twelve arc42 chapters, in arc42 order
     Q4.1-Q4.8  the eight ISO/IEC 25010 characteristics;
     Q4.9       which characteristic has priority
     Q5.1-Q5.5  technical debt, security risks, operational risks,
                dependency/supply-chain risks, scaling/performance risks

3. Below the fixed second level, decompose freely and code-driven. Stop
   when a leaf is small enough to answer from a single piece of code
   evidence, or to pose as a single precise question to a stakeholder.
   Third-level depth varies between runs — that is expected. Q-IDs are
   stable: Q3.7 is always the Deployment View, in every run, so trees
   from different runs can be diffed node-by-node.

4. For each leaf, classify it:

   [ANSWERED]
     - You found the answer in the code.
     - Cite the evidence as <file>:<line> or <file>::<function>.
     - Be exact. No "see X for details."

   [OPEN]
     - The answer is not derivable from code alone.
     - Category: business-context | design-rationale | quality-goals |
                 stakeholder-context | future-direction
     - Ask role: Product Owner | Architect | Developer | Domain Expert |
                 Operations
     - State precisely what cannot be answered, and why.

   Quality (the Q4 branch) is not wholly team knowledge. Where the code
   shows measurable behaviour — a timeout, a truncation limit, a budget,
   a retry policy, the threats and test concept from Q3.8 — write it as
   an [ANSWERED] quality scenario with file:line. Never invent a target
   number. Only the quality-goal ranking (Q4.9) is [OPEN].

5. Output two files in AsciiDoc:

   QUESTION_TREE.adoc
     - Full hierarchical tree with all nodes and Q-IDs
     - Each leaf marked [ANSWERED] (with evidence) or [OPEN] (with Category
       and Ask role)
     - Includes all reasoning, not only the leaves

   OPEN_QUESTIONS.adoc
     - Only the [OPEN] leaves, copied verbatim from QUESTION_TREE.adoc
     - Always one section per Ask role (Product Owner, Architect,
       Developer, Domain Expert, Operations) — emit every section even
       when it is empty ("No open questions for this role")
     - Each question short enough to be answered in 1-3 sentences

Do not write any other documentation in this phase. Phase 2 will synthesize
the answered tree into PRD, specification, arc42, and ADRs — only after the
team has filled in the [OPEN] leaves.

Between Phases: Team Answers the Open Questions

Route the Open Questions to the people who can answer them. In a controlled experiment with a 13,000-line Go codebase, 11 targeted questions were sufficient to close the gap between reverse-engineered documentation and the original. The questions are precise because the recursive decomposition ensures they are specific, not vague.

Typical questions the LLM cannot answer from code:

Category Example

Business Context

Why was this built? What alternatives existed?

Design Rationale

Why JSONC instead of YAML? Why this library?

Quality Goals

Which quality goal has priority? What are the thresholds?

Stakeholder Context

Who uses this? What is their skill level?

Future Direction

What is planned but not yet implemented?

Phase 2: Synthesize Documentation

The LLM synthesizes the answered questions plus the code evidence from Phase 1 into documentation following the spec-driven workflow:

  • PRD from Q-1 branch answers

  • Specification (Cockburn Use Cases, CLI spec, data models, Gherkin acceptance criteria) from Q-2 branch

  • arc42 with all 12 chapters from Q-3 branch

  • Nygard ADRs with Pugh Matrix from Q-3.9 branch

Code-derived claims carry the file:line evidence from their [ANSWERED] leaf — a reference to the code, the only durable artifact; team-provided information is marked (team answer). The Question Tree is temporary scaffolding, so Q-IDs are not written into the final documents — during synthesis every claim is traced back to a leaf as a build-time check. This dual traceability (code evidence + team input) is the key difference from a simple reverse-engineering prompt.

Establish Baseline Tests

From the synthesized Use Cases, write tests that verify the existing behavior. These tests are your safety net.

Prompt
Based on the Use Cases in docs/specs/use-cases-[context-name].adoc, write tests that verify the current behavior.
Use TDD, London School. Each test references its Use Case ID for traceability.
Do not change any production code. Only add tests.

Run the tests. Every test must pass against the current code. If a test fails, the extracted use case was wrong — fix the use case, then fix the test.

Do not skip baseline tests. Without them, you cannot distinguish between "my change broke something" and "it was already broken." This is the closed loop that makes brownfield changes safe.

What the LLM Can and Cannot Recover

A controlled experiment (deleting documentation from a greenfield project and regenerating it from code) showed:

Derivable from code: Functional requirements (21 vs. 7 in the original), acceptance criteria (69 vs. 40), building block views, glossary (31 terms vs. 2 placeholders), security mechanisms, crosscutting concepts.

NOT derivable from code: Business context, design rationale (ADR "why"), quality goal priorities, stakeholder concerns, aspirational features, performance budgets, tutorials, review results.

Semantic Anchors serve a dual purpose in this workflow: prompt compression (a 69-line prompt produced 3,850 lines of correctly structured documentation) and decomposition heuristics ("arc42" generates 12 MECE sub-questions without additional instructions).

Spec Drift and Reconciliation

Even in well-documented projects, the specification drifts from the code. The implementation LLM adds security hardening, validation rules, and edge cases that were never in the original specification. This is not a discipline problem — it is a structural property of the workflow.

The fix: periodic spec reconciliation. Run the reverse-engineering prompt against current code and diff against the existing spec. The diff reveals new requirements (in code, not in spec), changed behavior (diverged), and dead spec (documented but removed).

Three natural trigger points: before a release, after a security review, before onboarding.

Phase 1-12: The Standard Workflow

Once you have use cases and baseline tests for your bounded context, the standard workflow applies.

  • New features get new use cases, new acceptance criteria, and new tests — exactly as in greenfield.

  • Bug fixes start by identifying which use case is violated, then follow the TDD bug fix loop (Step 12).

  • Refactoring is protected by the baseline tests. If the tests stay green, the refactoring is safe.

The only difference: your arc42 documentation may start incomplete. That is fine. Fill in the architecture sections as you learn about the system. After a few bounded contexts, the architecture documentation will cover the parts that matter.

Incremental Expansion

After your first bounded context is covered, pick the next one. Each context you onboard adds to the system’s spec coverage.

Over time, a pattern emerges:

Iteration Coverage

First context

One feature area has use cases, tests, and architecture docs.

3-5 contexts

The core of the system is documented. Cross-cutting concerns become visible.

10+ contexts

Most changes happen in areas with existing specs. New work feels like greenfield.

You do not need 100% coverage. The goal is to cover the areas that change most frequently. Stable code that nobody touches does not need specs.

Prompt Cheat Sheet: Brownfield

Phase Prompt Anchors

Scope

Analyze the codebase in [path]. Identify bounded contexts using DDD. List name, responsibility, entities, interfaces.

DDD

Theory Recovery (Phase 1)

You have access to [bounded context path]. No documentation exists. Build a Question Tree: 5 root questions (Problem/Users, Specification, Architecture, Quality Goals, Risks), each with a FIXED second level (Q1.1-Q5.5: 6 PRD elements, 6 spec categories, 12 arc42 chapters, 8 ISO 25010 characteristics + priority, 5 risk categories); free decomposition only below it. Each leaf: [ANSWERED] with file:line evidence or [OPEN] with Category and Ask role.

arc42, Cockburn, ISO 25010, Nygard ADR

Team Answers

Route OPEN_QUESTIONS.adoc to the team by Ask role. Typically 10-15 questions.

 — 

Theory Recovery (Phase 2)

Synthesize self-contained documentation from the Question Tree and team answers. Cite file:line evidence for code-derived claims, mark team input with (team answer), keep deferred questions as explicit gaps. Q-IDs stay out of the output.

Spec-Driven Workflow

Baseline Tests

Write tests for the Use Cases in [spec file]. Each test references its Use Case ID. Do not change production code.

TDD London / Chicago

Continue

Follow the standard workflow from Step 3 (PRD) or Step 8 (Implementation), depending on whether you are adding new features or fixing bugs.

 — 

Reconciliation

Compare existing spec in [path] against current code. Report: NEW (in code, not in spec), CHANGED (diverged), DEAD (in spec, not in code). Do not modify existing files.

 — 

When Not to Use This Approach

This workflow assumes you want to evolve the existing system. If you are planning a full rewrite, use the greenfield workflow instead.

It also assumes the existing code is runnable. If the system cannot be built or started, you have a different problem — fix that first.

Further Reading

  • The Harness Inventory — the error-correction layers that surround any agentic workflow, including this one. Brownfield Recovery raises the signal; the Harness Inventory catalogues what catches the remaining noise.

  • Simon Martinelli, AI Unified Process — the bounded-context approach to spec-driven development in existing systems.

  • Eric Evans, Domain-Driven Design — the foundational work on bounded contexts and strategic design.

  • Michael Feathers, Working Effectively with Legacy Code — techniques for establishing test coverage in systems without tests.

  • Peter Naur, "Programming as Theory Building" (1985) — argues that programming is about building a mental model ("theory") that cannot be fully captured in documentation. Socratic Code-Theory Recovery tests this claim in the context of LLM-generated code.

  • Brownfield Experiment Report — controlled experiment: delete documentation from a greenfield project, regenerate from code, compare. Full methodology and findings.

  • Fair Comparison Report — three approaches (Direct, Socratic, Two-Phase) with identical team answers. Measures the structural value of the Question Tree.