Adapting the Workflow to Brownfield Projects
Introduction
You have mastered the greenfield workflow. Now you want to apply it to an existing codebase.
The core principles remain the same: small steps, high autonomy, error correction loops. But brownfield projects add a challenge that greenfield projects do not have: the system already exists. You cannot start with a blank slate. You must understand what is there before you change it.
This document describes how to onboard an existing codebase into the spec-driven workflow. The key insight comes from Simon Martinelli’s AI Unified Process: you do not need to specify the entire system. You work one bounded context at a time. Spec coverage grows incrementally, feature by feature.
The Brownfield Paradox
In greenfield projects, you write the spec first and the code follows. In brownfield projects, the code already exists — but the spec often does not. The system is the specification, except nobody can read it.
The temptation is to reverse-engineer the entire system into documentation before changing anything. Do not do that. That is Big Upfront Documentation, and it fails for the same reasons as Big Upfront Design.
Instead: specify only the bounded context you are about to change, and only as much as you need to change it safely.
Phase 0: Scope a Bounded Context
Before touching code, identify the area you want to change.
A bounded context is a coherent slice of the system with clear boundaries. It might be a module, a service, a feature area, or a screen. The boundaries should be small enough that you can understand the context in a single session.
Use ⚓ Domain-Driven Design to identify the context boundary. The AI can help: point it at the code and ask it to identify bounded contexts and their interfaces.
Analyze the codebase in src/. Identify bounded contexts using Domain-Driven Design.
For each context, list: name, responsibility, key entities, interfaces to other contexts.
Present as a table.
Pick one bounded context to start with. Choose one that is small, well-isolated, and has a change request pending.
Phase 0.5: Socratic Code-Theory Recovery
Before changing anything, you need to recover the "theory" of the bounded context — what Peter Naur called the mental model that lives in the heads of the original developers. In a brownfield project, this model is not documented. The code is the only source.
This phase uses Socratic Code-Theory Recovery: a two-phase workflow that builds understanding through recursive question refinement before producing documentation.
|
The prompts in this phase are also packaged as an installable Claude Code Skill. See the Socratic Code-Theory Recovery Skill page for installation instructions across Claude Code, Codex, Cursor, GitHub Copilot, Gemini CLI, and Amazon Kiro. |
Phase 1: Build the Question Tree
Start with five root questions about the bounded context. Their second level is fixed: every run emits the same prescribed set of sub-questions — the six PRD elements (Q1.1–Q1.6), six specification categories (Q2.1–Q2.6), the twelve arc42 chapters (Q3.1–Q3.12), the eight ISO/IEC 25010 characteristics plus a priority question (Q4.1–Q4.9), and five risk categories (Q5.1–Q5.5). Free, code-driven decomposition applies only below that fixed level. The fixed level keeps Q-IDs stable — Q3.7 is always the Deployment View — so trees from different runs can be diffed node-by-node.
Each leaf in the tree is either [ANSWERED] (with code evidence: file, function, line) or [OPEN] (with Category and Ask role). The output is two AsciiDoc files: QUESTION_TREE.adoc (full reasoning trace) and OPEN_QUESTIONS.adoc (handoff document, grouped by role).
Copy the prompt below into a session that has read access to the bounded context. Adapt [bounded context path] and the example questions if your domain warrants it; do not change the leaf classification, Q-ID scheme, or output files.
You are performing Socratic Code-Theory Recovery on a brownfield bounded
context located at [bounded context path]. Phase 1 of two.
Goal: recover the program's theory (Naur, 1985) from source code through
recursive question refinement, before any documentation is written.
Process:
1. Start with five root questions about the bounded context:
Q1 What problem does this bounded context solve, and for whom?
Q2 What is the specification of this bounded context?
Q3 What is the architecture of this bounded context?
Q4 What quality goals drive the design?
Q5 What risks and technical debt exist?
2. The second level of the tree is FIXED, not free. Emit exactly these
nodes, in this order, even when a node's only leaf is [OPEN] or
[ANSWERED: not applicable]:
Q1.1-Q1.6 product identity, primary users, channels, why-built,
success metrics, segment priority
Q2.1-Q2.6 actors, use-case catalog, per-interface system specs,
data/entity model, acceptance criteria, cross-cutting
business rules
Q3.1-Q3.12 the twelve arc42 chapters, in arc42 order
Q4.1-Q4.8 the eight ISO/IEC 25010 characteristics;
Q4.9 which characteristic has priority
Q5.1-Q5.5 technical debt, security risks, operational risks,
dependency/supply-chain risks, scaling/performance risks
3. Below the fixed second level, decompose freely and code-driven. Stop
when a leaf is small enough to answer from a single piece of code
evidence, or to pose as a single precise question to a stakeholder.
Third-level depth varies between runs — that is expected. Q-IDs are
stable: Q3.7 is always the Deployment View, in every run, so trees
from different runs can be diffed node-by-node.
4. For each leaf, classify it:
[ANSWERED]
- You found the answer in the code.
- Cite the evidence as <file>:<line> or <file>::<function>.
- Be exact. No "see X for details."
[OPEN]
- The answer is not derivable from code alone.
- Category: business-context | design-rationale | quality-goals |
stakeholder-context | future-direction
- Ask role: Product Owner | Architect | Developer | Domain Expert |
Operations
- State precisely what cannot be answered, and why.
Quality (the Q4 branch) is not wholly team knowledge. Where the code
shows measurable behaviour — a timeout, a truncation limit, a budget,
a retry policy, the threats and test concept from Q3.8 — write it as
an [ANSWERED] quality scenario with file:line. Never invent a target
number. Only the quality-goal ranking (Q4.9) is [OPEN].
5. Output two files in AsciiDoc:
QUESTION_TREE.adoc
- Full hierarchical tree with all nodes and Q-IDs
- Each leaf marked [ANSWERED] (with evidence) or [OPEN] (with Category
and Ask role)
- Includes all reasoning, not only the leaves
OPEN_QUESTIONS.adoc
- Only the [OPEN] leaves, copied verbatim from QUESTION_TREE.adoc
- Always one section per Ask role (Product Owner, Architect,
Developer, Domain Expert, Operations) — emit every section even
when it is empty ("No open questions for this role")
- Each question short enough to be answered in 1-3 sentences
Do not write any other documentation in this phase. Phase 2 will synthesize
the answered tree into PRD, specification, arc42, and ADRs — only after the
team has filled in the [OPEN] leaves.
Between Phases: Team Answers the Open Questions
Route the Open Questions to the people who can answer them. In a controlled experiment with a 13,000-line Go codebase, 11 targeted questions were sufficient to close the gap between reverse-engineered documentation and the original. The questions are precise because the recursive decomposition ensures they are specific, not vague.
Typical questions the LLM cannot answer from code:
| Category | Example |
|---|---|
Business Context |
Why was this built? What alternatives existed? |
Design Rationale |
Why JSONC instead of YAML? Why this library? |
Quality Goals |
Which quality goal has priority? What are the thresholds? |
Stakeholder Context |
Who uses this? What is their skill level? |
Future Direction |
What is planned but not yet implemented? |
Phase 2: Synthesize Documentation
The LLM synthesizes the answered questions plus the code evidence from Phase 1 into documentation following the spec-driven workflow:
-
PRD from Q-1 branch answers
-
Specification (Cockburn Use Cases, CLI spec, data models, Gherkin acceptance criteria) from Q-2 branch
-
arc42 with all 12 chapters from Q-3 branch
-
Nygard ADRs with Pugh Matrix from Q-3.9 branch
Code-derived claims carry the file:line evidence from their [ANSWERED] leaf — a reference to the code, the only durable artifact; team-provided information is marked (team answer). The Question Tree is temporary scaffolding, so Q-IDs are not written into the final documents — during synthesis every claim is traced back to a leaf as a build-time check. This dual traceability (code evidence + team input) is the key difference from a simple reverse-engineering prompt.
Establish Baseline Tests
From the synthesized Use Cases, write tests that verify the existing behavior. These tests are your safety net.
Based on the Use Cases in docs/specs/use-cases-[context-name].adoc, write tests that verify the current behavior.
Use TDD, London School. Each test references its Use Case ID for traceability.
Do not change any production code. Only add tests.
Run the tests. Every test must pass against the current code. If a test fails, the extracted use case was wrong — fix the use case, then fix the test.
|
Do not skip baseline tests. Without them, you cannot distinguish between "my change broke something" and "it was already broken." This is the closed loop that makes brownfield changes safe. |
What the LLM Can and Cannot Recover
A controlled experiment (deleting documentation from a greenfield project and regenerating it from code) showed:
Derivable from code: Functional requirements (21 vs. 7 in the original), acceptance criteria (69 vs. 40), building block views, glossary (31 terms vs. 2 placeholders), security mechanisms, crosscutting concepts.
NOT derivable from code: Business context, design rationale (ADR "why"), quality goal priorities, stakeholder concerns, aspirational features, performance budgets, tutorials, review results.
Semantic Anchors serve a dual purpose in this workflow: prompt compression (a 69-line prompt produced 3,850 lines of correctly structured documentation) and decomposition heuristics ("arc42" generates 12 MECE sub-questions without additional instructions).
Spec Drift and Reconciliation
Even in well-documented projects, the specification drifts from the code. The implementation LLM adds security hardening, validation rules, and edge cases that were never in the original specification. This is not a discipline problem — it is a structural property of the workflow.
The fix: periodic spec reconciliation. Run the reverse-engineering prompt against current code and diff against the existing spec. The diff reveals new requirements (in code, not in spec), changed behavior (diverged), and dead spec (documented but removed).
Three natural trigger points: before a release, after a security review, before onboarding.
Phase 1-12: The Standard Workflow
Once you have use cases and baseline tests for your bounded context, the standard workflow applies.
-
New features get new use cases, new acceptance criteria, and new tests — exactly as in greenfield.
-
Bug fixes start by identifying which use case is violated, then follow the TDD bug fix loop (Step 12).
-
Refactoring is protected by the baseline tests. If the tests stay green, the refactoring is safe.
The only difference: your arc42 documentation may start incomplete. That is fine. Fill in the architecture sections as you learn about the system. After a few bounded contexts, the architecture documentation will cover the parts that matter.
Incremental Expansion
After your first bounded context is covered, pick the next one. Each context you onboard adds to the system’s spec coverage.
Over time, a pattern emerges:
| Iteration | Coverage |
|---|---|
First context |
One feature area has use cases, tests, and architecture docs. |
3-5 contexts |
The core of the system is documented. Cross-cutting concerns become visible. |
10+ contexts |
Most changes happen in areas with existing specs. New work feels like greenfield. |
You do not need 100% coverage. The goal is to cover the areas that change most frequently. Stable code that nobody touches does not need specs.
Prompt Cheat Sheet: Brownfield
| Phase | Prompt | Anchors |
|---|---|---|
Scope |
|
|
Theory Recovery (Phase 1) |
|
|
Team Answers |
Route OPEN_QUESTIONS.adoc to the team by Ask role. Typically 10-15 questions. |
— |
Theory Recovery (Phase 2) |
|
|
Baseline Tests |
|
|
Continue |
Follow the standard workflow from Step 3 (PRD) or Step 8 (Implementation), depending on whether you are adding new features or fixing bugs. |
— |
Reconciliation |
|
— |
When Not to Use This Approach
This workflow assumes you want to evolve the existing system. If you are planning a full rewrite, use the greenfield workflow instead.
It also assumes the existing code is runnable. If the system cannot be built or started, you have a different problem — fix that first.
Further Reading
-
The Harness Inventory — the error-correction layers that surround any agentic workflow, including this one. Brownfield Recovery raises the signal; the Harness Inventory catalogues what catches the remaining noise.
-
Simon Martinelli, AI Unified Process — the bounded-context approach to spec-driven development in existing systems.
-
Eric Evans, Domain-Driven Design — the foundational work on bounded contexts and strategic design.
-
Michael Feathers, Working Effectively with Legacy Code — techniques for establishing test coverage in systems without tests.
-
Peter Naur, "Programming as Theory Building" (1985) — argues that programming is about building a mental model ("theory") that cannot be fully captured in documentation. Socratic Code-Theory Recovery tests this claim in the context of LLM-generated code.
-
Brownfield Experiment Report — controlled experiment: delete documentation from a greenfield project, regenerate from code, compare. Full methodology and findings.
-
Fair Comparison Report — three approaches (Direct, Socratic, Two-Phase) with identical team answers. Measures the structural value of the Question Tree.