Adapting the Workflow to Brownfield Projects

Introduction

You have mastered the greenfield workflow. Now you want to apply it to an existing codebase.

The core principles remain the same: small steps, high autonomy, error correction loops. But brownfield projects add a challenge that greenfield projects do not have: the system already exists. You cannot start with a blank slate. You must understand what is there before you change it.

This document describes how to onboard an existing codebase into the spec-driven workflow. The key insight comes from Simon Martinelli’s AI Unified Process: you do not need to specify the entire system. You work one bounded context at a time. Spec coverage grows incrementally, feature by feature.

The Brownfield Paradox

In greenfield projects, you write the spec first and the code follows. In brownfield projects, the code already exists — but the spec often does not. The system is the specification, except nobody can read it.

The temptation is to reverse-engineer the entire system into documentation before changing anything. Do not do that. That is Big Upfront Documentation, and it fails for the same reasons as Big Upfront Design.

Instead: specify only the bounded context you are about to change, and only as much as you need to change it safely.

Phase 0: Scope a Bounded Context

Before touching code, identify the area you want to change.

A bounded context is a coherent slice of the system with clear boundaries. It might be a module, a service, a feature area, or a screen. The boundaries should be small enough that you can understand the context in a single session.

Use ⚓ Domain-Driven Design to identify the context boundary. The AI can help: point it at the code and ask it to identify bounded contexts and their interfaces.

Prompt
Analyze the codebase in src/. Identify bounded contexts using Domain-Driven Design.
For each context, list: name, responsibility, key entities, interfaces to other contexts.
Present as a table.

Pick one bounded context to start with. Choose one that is small, well-isolated, and has a change request pending.

Phase 0.5: Socratic Code Theory Recovery

Before changing anything, you need to recover the "theory" of the bounded context — what Peter Naur called the mental model that lives in the heads of the original developers. In a brownfield project, this model is not documented. The code is the only source.

This phase uses Socratic Code Theory Recovery: a two-phase workflow that builds understanding through recursive question refinement before producing documentation.

Phase 1: Build the Question Tree

Start with five high-level questions about the bounded context and decompose them recursively. Use Semantic Anchors as decomposition guides: arc42 for architecture, Cockburn Use Cases for specification, ISO 25010 for quality, Nygard ADRs for decisions.

Starting Questions (adapt to your bounded context)
1. What problem does this bounded context solve and for whom?
2. What is the specification of this bounded context?
3. What is the architecture of this bounded context?
4. What quality goals drive the design?
5. What risks and technical debt exist?

Each leaf in the tree is either [ANSWERED] (with code evidence: file, function, line) or [OPEN] (with Category and Ask role).

The output is two files:

  • QUESTION_TREE.adoc — the full reasoning trace

  • OPEN_QUESTIONS.adoc — the handoff document, grouped by role (Product Owner, Architect, Developer, Domain Expert, Operations)

Between Phases: Team Answers the Open Questions

Route the Open Questions to the people who can answer them. In a controlled experiment with a 13,000-line Go codebase, 11 targeted questions were sufficient to close the gap between reverse-engineered documentation and the original. The questions are precise because the recursive decomposition ensures they are specific, not vague.

Typical questions the LLM cannot answer from code:

Category Example

Business Context

Why was this built? What alternatives existed?

Design Rationale

Why JSONC instead of YAML? Why this library?

Quality Goals

Which quality goal has priority? What are the thresholds?

Stakeholder Context

Who uses this? What is their skill level?

Future Direction

What is planned but not yet implemented?

Phase 2: Synthesize Documentation

The LLM synthesizes the answered questions plus the code evidence from Phase 1 into documentation following the spec-driven workflow:

  • PRD from Q-1 branch answers

  • Specification (Cockburn Use Cases, CLI spec, data models, Gherkin acceptance criteria) from Q-2 branch

  • arc42 with all 12 chapters from Q-3 branch

  • Nygard ADRs with Pugh Matrix from Q-3.9 branch

Every claim references a Question ID and marks team-provided information with (team answer). This dual traceability (code evidence + team input) is the key difference from a simple reverse-engineering prompt.

Establish Baseline Tests

From the synthesized Use Cases, write tests that verify the existing behavior. These tests are your safety net.

Prompt
Based on the Use Cases in docs/specs/use-cases-[context-name].adoc, write tests that verify the current behavior.
Use TDD, London School. Each test references its Use Case ID for traceability.
Do not change any production code. Only add tests.

Run the tests. Every test must pass against the current code. If a test fails, the extracted use case was wrong — fix the use case, then fix the test.

Do not skip baseline tests. Without them, you cannot distinguish between "my change broke something" and "it was already broken." This is the closed loop that makes brownfield changes safe.

What the LLM Can and Cannot Recover

A controlled experiment (deleting documentation from a greenfield project and regenerating it from code) showed:

Derivable from code: Functional requirements (21 vs. 7 in the original), acceptance criteria (69 vs. 40), building block views, glossary (31 terms vs. 2 placeholders), security mechanisms, crosscutting concepts.

NOT derivable from code: Business context, design rationale (ADR "why"), quality goal priorities, stakeholder concerns, aspirational features, performance budgets, tutorials, review results.

Semantic Anchors serve a dual purpose in this workflow: prompt compression (a 69-line prompt produced 3,850 lines of correctly structured documentation) and decomposition heuristics ("arc42" generates 12 MECE sub-questions without additional instructions).

Spec Drift and Reconciliation

Even in well-documented projects, the specification drifts from the code. The implementation LLM adds security hardening, validation rules, and edge cases that were never in the original specification. This is not a discipline problem — it is a structural property of the workflow.

The fix: periodic spec reconciliation. Run the reverse-engineering prompt against current code and diff against the existing spec. The diff reveals new requirements (in code, not in spec), changed behavior (diverged), and dead spec (documented but removed).

Three natural trigger points: before a release, after a security review, before onboarding.

Phase 1-12: The Standard Workflow

Once you have use cases and baseline tests for your bounded context, the standard workflow applies.

  • New features get new use cases, new acceptance criteria, and new tests — exactly as in greenfield.

  • Bug fixes start by identifying which use case is violated, then follow the TDD bug fix loop (Step 12).

  • Refactoring is protected by the baseline tests. If the tests stay green, the refactoring is safe.

The only difference: your arc42 documentation may start incomplete. That is fine. Fill in the architecture sections as you learn about the system. After a few bounded contexts, the architecture documentation will cover the parts that matter.

Incremental Expansion

After your first bounded context is covered, pick the next one. Each context you onboard adds to the system’s spec coverage.

Over time, a pattern emerges:

Iteration Coverage

First context

One feature area has use cases, tests, and architecture docs.

3-5 contexts

The core of the system is documented. Cross-cutting concerns become visible.

10+ contexts

Most changes happen in areas with existing specs. New work feels like greenfield.

You do not need 100% coverage. The goal is to cover the areas that change most frequently. Stable code that nobody touches does not need specs.

Prompt Cheat Sheet: Brownfield

Phase Prompt Anchors

Scope

Analyze the codebase in [path]. Identify bounded contexts using DDD. List name, responsibility, entities, interfaces.

DDD

Theory Recovery (Phase 1)

You have access to [bounded context path]. No documentation exists. Build a Question Tree by recursively refining 5 questions: Problem/Users, Specification, Architecture, Quality Goals, Risks. Each leaf: [ANSWERED] with code evidence or [OPEN] with Category and Ask role.

arc42, Cockburn, ISO 25010, Nygard ADR

Team Answers

Route OPEN_QUESTIONS.adoc to the team by Ask role. Typically 10-15 questions.

 — 

Theory Recovery (Phase 2)

Synthesize documentation from the Question Tree and team answers. Every claim references a Q-ID. Mark team input with (team answer).

Spec-Driven Workflow

Baseline Tests

Write tests for the Use Cases in [spec file]. Each test references its Use Case ID. Do not change production code.

TDD London / Chicago

Continue

Follow the standard workflow from Step 3 (PRD) or Step 8 (Implementation), depending on whether you are adding new features or fixing bugs.

 — 

Reconciliation

Compare existing spec in [path] against current code. Report: NEW (in code, not in spec), CHANGED (diverged), DEAD (in spec, not in code). Do not modify existing files.

 — 

When Not to Use This Approach

This workflow assumes you want to evolve the existing system. If you are planning a full rewrite, use the greenfield workflow instead.

It also assumes the existing code is runnable. If the system cannot be built or started, you have a different problem — fix that first.

Further Reading

  • Simon Martinelli, AI Unified Process — the bounded-context approach to spec-driven development in existing systems.

  • Eric Evans, Domain-Driven Design — the foundational work on bounded contexts and strategic design.

  • Michael Feathers, Working Effectively with Legacy Code — techniques for establishing test coverage in systems without tests.

  • Peter Naur, "Programming as Theory Building" (1985) — argues that programming is about building a mental model ("theory") that cannot be fully captured in documentation. Socratic Code Theory Recovery tests this claim in the context of LLM-generated code.

  • Brownfield Experiment Report — controlled experiment: delete documentation from a greenfield project, regenerate from code, compare. Full methodology and findings.

  • Fair Comparison Report — three approaches (Direct, Socratic, Two-Phase) with identical team answers. Measures the structural value of the Question Tree.