Adapting the Workflow to Brownfield Projects
Introduction
You have mastered the greenfield workflow. Now you want to apply it to an existing codebase.
The core principles remain the same: small steps, high autonomy, error correction loops. But brownfield projects add a challenge that greenfield projects do not have: the system already exists. You cannot start with a blank slate. You must understand what is there before you change it.
This document describes how to onboard an existing codebase into the spec-driven workflow. The key insight comes from Simon Martinelli’s AI Unified Process: you do not need to specify the entire system. You work one bounded context at a time. Spec coverage grows incrementally, feature by feature.
The Brownfield Paradox
In greenfield projects, you write the spec first and the code follows. In brownfield projects, the code already exists — but the spec often does not. The system is the specification, except nobody can read it.
The temptation is to reverse-engineer the entire system into documentation before changing anything. Do not do that. That is Big Upfront Documentation, and it fails for the same reasons as Big Upfront Design.
Instead: specify only the bounded context you are about to change, and only as much as you need to change it safely.
Phase 0: Scope a Bounded Context
Before touching code, identify the area you want to change.
A bounded context is a coherent slice of the system with clear boundaries. It might be a module, a service, a feature area, or a screen. The boundaries should be small enough that you can understand the context in a single session.
Use ⚓ Domain-Driven Design to identify the context boundary. The AI can help: point it at the code and ask it to identify bounded contexts and their interfaces.
Analyze the codebase in src/. Identify bounded contexts using Domain-Driven Design.
For each context, list: name, responsibility, key entities, interfaces to other contexts.
Present as a table.
Pick one bounded context to start with. Choose one that is small, well-isolated, and has a change request pending.
Phase 0.5: Socratic Code Theory Recovery
Before changing anything, you need to recover the "theory" of the bounded context — what Peter Naur called the mental model that lives in the heads of the original developers. In a brownfield project, this model is not documented. The code is the only source.
This phase uses Socratic Code Theory Recovery: a two-phase workflow that builds understanding through recursive question refinement before producing documentation.
Phase 1: Build the Question Tree
Start with five high-level questions about the bounded context and decompose them recursively. Use Semantic Anchors as decomposition guides: arc42 for architecture, Cockburn Use Cases for specification, ISO 25010 for quality, Nygard ADRs for decisions.
1. What problem does this bounded context solve and for whom?
2. What is the specification of this bounded context?
3. What is the architecture of this bounded context?
4. What quality goals drive the design?
5. What risks and technical debt exist?
Each leaf in the tree is either [ANSWERED] (with code evidence: file, function, line) or [OPEN] (with Category and Ask role).
The output is two files:
-
QUESTION_TREE.adoc— the full reasoning trace -
OPEN_QUESTIONS.adoc— the handoff document, grouped by role (Product Owner, Architect, Developer, Domain Expert, Operations)
Between Phases: Team Answers the Open Questions
Route the Open Questions to the people who can answer them. In a controlled experiment with a 13,000-line Go codebase, 11 targeted questions were sufficient to close the gap between reverse-engineered documentation and the original. The questions are precise because the recursive decomposition ensures they are specific, not vague.
Typical questions the LLM cannot answer from code:
| Category | Example |
|---|---|
Business Context |
Why was this built? What alternatives existed? |
Design Rationale |
Why JSONC instead of YAML? Why this library? |
Quality Goals |
Which quality goal has priority? What are the thresholds? |
Stakeholder Context |
Who uses this? What is their skill level? |
Future Direction |
What is planned but not yet implemented? |
Phase 2: Synthesize Documentation
The LLM synthesizes the answered questions plus the code evidence from Phase 1 into documentation following the spec-driven workflow:
-
PRD from Q-1 branch answers
-
Specification (Cockburn Use Cases, CLI spec, data models, Gherkin acceptance criteria) from Q-2 branch
-
arc42 with all 12 chapters from Q-3 branch
-
Nygard ADRs with Pugh Matrix from Q-3.9 branch
Every claim references a Question ID and marks team-provided information with (team answer). This dual traceability (code evidence + team input) is the key difference from a simple reverse-engineering prompt.
Establish Baseline Tests
From the synthesized Use Cases, write tests that verify the existing behavior. These tests are your safety net.
Based on the Use Cases in docs/specs/use-cases-[context-name].adoc, write tests that verify the current behavior.
Use TDD, London School. Each test references its Use Case ID for traceability.
Do not change any production code. Only add tests.
Run the tests. Every test must pass against the current code. If a test fails, the extracted use case was wrong — fix the use case, then fix the test.
|
Do not skip baseline tests. Without them, you cannot distinguish between "my change broke something" and "it was already broken." This is the closed loop that makes brownfield changes safe. |
What the LLM Can and Cannot Recover
A controlled experiment (deleting documentation from a greenfield project and regenerating it from code) showed:
Derivable from code: Functional requirements (21 vs. 7 in the original), acceptance criteria (69 vs. 40), building block views, glossary (31 terms vs. 2 placeholders), security mechanisms, crosscutting concepts.
NOT derivable from code: Business context, design rationale (ADR "why"), quality goal priorities, stakeholder concerns, aspirational features, performance budgets, tutorials, review results.
Semantic Anchors serve a dual purpose in this workflow: prompt compression (a 69-line prompt produced 3,850 lines of correctly structured documentation) and decomposition heuristics ("arc42" generates 12 MECE sub-questions without additional instructions).
Spec Drift and Reconciliation
Even in well-documented projects, the specification drifts from the code. The implementation LLM adds security hardening, validation rules, and edge cases that were never in the original specification. This is not a discipline problem — it is a structural property of the workflow.
The fix: periodic spec reconciliation. Run the reverse-engineering prompt against current code and diff against the existing spec. The diff reveals new requirements (in code, not in spec), changed behavior (diverged), and dead spec (documented but removed).
Three natural trigger points: before a release, after a security review, before onboarding.
Phase 1-12: The Standard Workflow
Once you have use cases and baseline tests for your bounded context, the standard workflow applies.
-
New features get new use cases, new acceptance criteria, and new tests — exactly as in greenfield.
-
Bug fixes start by identifying which use case is violated, then follow the TDD bug fix loop (Step 12).
-
Refactoring is protected by the baseline tests. If the tests stay green, the refactoring is safe.
The only difference: your arc42 documentation may start incomplete. That is fine. Fill in the architecture sections as you learn about the system. After a few bounded contexts, the architecture documentation will cover the parts that matter.
Incremental Expansion
After your first bounded context is covered, pick the next one. Each context you onboard adds to the system’s spec coverage.
Over time, a pattern emerges:
| Iteration | Coverage |
|---|---|
First context |
One feature area has use cases, tests, and architecture docs. |
3-5 contexts |
The core of the system is documented. Cross-cutting concerns become visible. |
10+ contexts |
Most changes happen in areas with existing specs. New work feels like greenfield. |
You do not need 100% coverage. The goal is to cover the areas that change most frequently. Stable code that nobody touches does not need specs.
Prompt Cheat Sheet: Brownfield
| Phase | Prompt | Anchors |
|---|---|---|
Scope |
|
|
Theory Recovery (Phase 1) |
|
|
Team Answers |
Route OPEN_QUESTIONS.adoc to the team by Ask role. Typically 10-15 questions. |
— |
Theory Recovery (Phase 2) |
|
|
Baseline Tests |
|
|
Continue |
Follow the standard workflow from Step 3 (PRD) or Step 8 (Implementation), depending on whether you are adding new features or fixing bugs. |
— |
Reconciliation |
|
— |
When Not to Use This Approach
This workflow assumes you want to evolve the existing system. If you are planning a full rewrite, use the greenfield workflow instead.
It also assumes the existing code is runnable. If the system cannot be built or started, you have a different problem — fix that first.
Further Reading
-
Simon Martinelli, AI Unified Process — the bounded-context approach to spec-driven development in existing systems.
-
Eric Evans, Domain-Driven Design — the foundational work on bounded contexts and strategic design.
-
Michael Feathers, Working Effectively with Legacy Code — techniques for establishing test coverage in systems without tests.
-
Peter Naur, "Programming as Theory Building" (1985) — argues that programming is about building a mental model ("theory") that cannot be fully captured in documentation. Socratic Code Theory Recovery tests this claim in the context of LLM-generated code.
-
Brownfield Experiment Report — controlled experiment: delete documentation from a greenfield project, regenerate from code, compare. Full methodology and findings.
-
Fair Comparison Report — three approaches (Direct, Socratic, Two-Phase) with identical team answers. Measures the structural value of the Question Tree.