Spec-Driven Development with Semantic Anchors

Working on an existing codebase? This page describes the greenfield workflow — starting from a blank slate. For brownfield projects (existing code, missing or untrusted documentation), see the Brownfield Workflow and the installable Socratic Code-Theory Recovery Skill.

Introduction

This document describes how to build production-quality software with AI agents, guided by Semantic Anchors.

Semantic Anchors are compact terms that reliably activate rich knowledge domains in LLMs. Instead of writing pages of instructions, you reference a concept the model already understands deeply. "Use TDD, London School" is shorter than explaining test-driven development with mocks and outside-in design. "Follow arc42" is shorter than describing 12 architecture sections. The prompts stay short, precise, and maintainable.

This workflow was used to build four open source projects, all 100% AI-generated. The golden rule: I only prompt, I never touch the code myself. Every line of code, every test, every documentation file was written by the AI under my guidance.

dacli: A full CLI tool with spec, architecture docs, tests, and user manual. Built by one AI, cross-reviewed by 5 different LLMs.
Semantic Anchors Website: 228+ GitHub stars, documentation, and a video series.
Vibe Coding Risk Radar: An interactive web app for assessing AI coding risks.
Bausteinsicht: Bidirectional synchronization between a structured JSON architecture model and draw.io diagrams — Architecture-as-Code meets draw.io.

Semantic Anchors marked with ⚓ are highlighted throughout this document. Click on any anchor to see its full definition on the Semantic Anchors website.

This workflow is designed for greenfield projects built from scratch with AI assistance. For existing codebases, see Adapting the Workflow to Brownfield Projects.

The Key Principle: Small Steps, High Autonomy

The most common mistake with AI coding is prompting "Build me a CLI tool" and expecting a working result. That is like asking a junior developer to build an entire application from a one-sentence briefing. The result will be unpredictable at best.

This workflow takes the opposite approach: break the work into small, well-defined steps and let the AI handle each one autonomously. Each step produces a concrete artifact that you can review if you see the need for it.

The paradox: the smaller you make each task, the more autonomy you can give the agent. A vague "build me X" needs constant supervision. A precise "implement issue #42 using TDD, respecting the spec and architecture in src/docs/" can run on its own. The phases described in this document are designed to produce exactly that kind of precise, self-contained task.

This connects directly to Eichhorst’s Principle, which applies Shannon’s noisy channel theorem to LLM coding. An LLM is not a deterministic tool. It is a noisy, non-deterministic channel. It hallucinates, loses context, and is sometimes plain wrong. But an agent in a loop corrects itself: the compiler reports an error, the agent reads it, fixes the code, runs the tests, reads the failure, fixes the logic, and repeats until green. That is not magic — that is error correction, exactly as Shannon described.

When you prompt an LLM and paste the result into your project, you run an open loop. No compiler check, no test suite, no review. The LLM guesses once and you hope it guessed right. When an agent writes code, runs the compiler, runs the tests, and iterates until everything passes, that is a closed loop. The same principle that makes a thermostat work.

Different tests correct different error classes. The compiler catches syntax errors. Unit tests catch logic errors. BDD tests catch domain errors. Each layer increases the reliability of the channel. Untested code is an uncorrected channel — the noise passes straight through.

The consequence: better tests beat better prompts. A comprehensive test suite turns a mediocre model into a reliable coding partner. And if the complexity of a specification exceeds the capacity of the LLM, more tokens will not help. The answer is smaller specifications, clearer boundaries, and better tests.

Each small step in this workflow is a short transmission over the noisy channel. Short transmissions with error correction are far more reliable than one long, unchecked transmission.

Workflow Overview

Cross-Cutting Concerns

These principles apply to all phases:

⚓ Plain English according to Strunk & White: All documentation uses short sentences, active voice, and no unnecessary words.
⚓ Conventional Commits: All commits follow a standardized format for a clean, parseable git history.
⚓ Docs-as-Code according to Ralf D. Müller: Documentation lives in the repository as AsciiDoc, built by docToolchain. Docs-as-Code treats documentation like source code: version-controlled, peer-reviewed, and built automatically.
⚓ Definition of Done: Code passes all tests, feature branch is merged or PR is created, documentation is updated, architecture decisions are recorded.

Prerequisites

Before starting, set up your project infrastructure:

Initialize a git repository
Install docToolchain and download the ⚓ arc42 template
Configure your AI coding environment with an AGENTS.md (or tool-specific equivalent like CLAUDE.md)
Give the AI agent access to GitHub or GitLab via the CLI (gh or glab). The agent will need this later to create issues, pull requests, and ADR discussions. Consider using a dedicated account for audit traceability.
Following Eichhorst’s Principle, set up error correction layers for your project: linters, pre-commit hooks, CI pipelines, and static analysis. Each layer catches a different class of error and makes the LLM channel more reliable. The Vibe Coding Risk Radar can help determine which checks are appropriate for your project’s risk profile. These checks unfold their full effect once the first lines of runnable code exist — set them up early, but revisit as the project grows.

Installing docToolchain (Linux/macOS)

cd <your-project>
curl -Lo dtcw https://doctoolchain.org/dtcw
chmod +x dtcw
./dtcw docker downloadTemplate

See the official installation guide for details and other platforms.

AGENTS.md as Project Memory

AGENTS.md is an open standard for guiding AI coding agents. The file lives in your repository root and is read automatically at the start of every session. It serves as the project memory: coding conventions, architectural decisions, file structure, and pointers to important documents.

Most AI coding tools support it or have an equivalent (Claude Code uses CLAUDE.md, for example).

A minimal AGENTS.md for this workflow:

# Project: <your project name>

## Key Documents
- PRD: src/docs/specs/prd.adoc
- Specification: src/docs/specs/
- Architecture: src/docs/arc42/
- Reviews: src/docs/reviews/

## Conventions
- Documentation: Plain English according to Strunk & White
- Testing: TDD (London or Chicago School as appropriate)
- Code: DRY, SOLID, KISS, Ubiquitous Language (DDD)
- Commits: Conventional Commits, reference issue number
- Branches: feature/<issue-description>

As the project progresses, the AI agent will maintain this file itself. When starting a new AI session, the agent reads AGENTS.md and immediately has the context it needs.

Compact the context before starting a new EPIC. Within a session, keep an eye on the context window. Compact the conversation manually at natural breakpoints (e.g. after completing an issue) rather than waiting for the model to auto-compact at an inconvenient moment and lose important context. The agent picks up context from AGENTS.md and the spec files automatically.

Phase 1: Requirements Discovery

Step 1: Describe Your Vision

Start by explaining your idea to the AI in your own words. This is the one place where more is better. Cover:

The problem you want to solve
Who will use it (target audience)
What the desired outcome looks like
Constraints: budget, timeline, tech stack preferences, platform
What it is not: explicit boundaries help the AI avoid scope creep
Inspiration: similar tools or approaches you have seen

Cover these aspects in whatever order feels natural. The next step will bring structure to your raw ideas.

Step 2: Clarify Requirements with the Socratic Method

Prompt the AI to use the ⚓ Socratic Method to clarify requirements.

Use the Socratic Method to help me clarify requirements for [your project].
Ask me at most 3 questions at a time. Challenge my assumptions.
Keep asking until you fully understand the requirements.

This activates targeted questioning, assumption challenging, and productive use of not-knowing. The constraint "at most 3 questions" prevents question overload.

You can layer additional anchors for structured coverage:

Use the Socratic Method combined with MECE to clarify requirements.

⚓ MECE (Mutually Exclusive, Collectively Exhaustive) ensures questions cover all areas without overlap.

Continue the dialogue until both you and the AI are satisfied that the requirements are clear.

Step 3: Document as PRD

Ask the AI to write a ⚓ Product Requirements Document (PRD) and save it as AsciiDoc. A PRD captures the what and why, not the how: problem statement, goals, user personas, success criteria, and scope boundaries.

Write a PRD based on our discussion. Save it as src/docs/specs/prd.adoc.

Step 4: Create Detailed Specification

Step 4 builds the specification in layers: from scope discovery (Actor-Goal List) through persona-level use cases to technical system specifications and supplementary models.

Step 4a: Discover Scope with the Actor-Goal List

Start by discovering scope with an ⚓ Actor-Goal List:

Create an Actor-Goal List from the PRD.
For each actor, list every goal they have against the system.
Apply the Goal Level test: "Does the actor go home happy if this goal
is achieved?" — if yes, it's a User Goal. If not, it's a Subfunction
(extract only when reused). Save as src/docs/specs/actor-goal-list.adoc.

Step 4b: Persona Use Cases

Generate use cases at User Goal level from the Actor-Goal List. These describe what actors want to achieve and how the system responds — in prose, at a level that stakeholders can review.

Create Persona Use Cases from the PRD and Actor-Goal List.
For each User Goal, write a Use Case in Cockburn's Fully Dressed format:
- Primary Actor and Stakeholders & Interests
- Trigger, Preconditions
- Main Success Scenario (numbered steps)
- Extensions (alternative/failure paths, referencing step numbers)
- Postconditions (Success Guarantee and Minimal Guarantee)
- Business Rules (BR-001, BR-002, ...)
Then add for each Use Case:
- Activity Diagram covering all flows (not just the happy path)
- Acceptance criteria in Gherkin format (Given/When/Then)
Save as .adoc files in src/docs/specs/.

Each Use Case in ⚓ Cockburn’s Fully Dressed format defines:

Primary Actor: Who initiates the use case and has the goal.
Stakeholders & Interests: Who else cares about the outcome and what they need from it.
Trigger: The specific event that starts the use case ("User clicks Submit", "System receives webhook"). Without a trigger, neither the AI nor a tester knows when the use case begins.
Main Success Scenario: The numbered steps of the happy path, each describing observable system behavior.
Extensions: Named branches (3a, 4b, …) for error handling, validation failures, and edge cases — referencing the step they branch from.
Postconditions: The guaranteed system state after successful completion (Success Guarantee) and the minimum guarantee even when the use case fails (Minimal Guarantee). Postconditions are what your tests assert.
Business Rules: Numbered rules (BR-001, BR-002, …) that capture validation constraints, calculation logic, or domain invariants. Business rules make implicit knowledge explicit. Without them, the AI invents its own validation logic — or skips it entirely.

The more precise the use case, the less the AI has to guess during implementation. Cockburn’s format is deliberately prose-based — it does not prescribe any specific notation. Activity Diagrams and Gherkin are complementary representations we layer on top:

⚓ Gherkin (Given/When/Then) provides acceptance criteria that are both human-readable and machine-testable. These criteria become the foundation for TDD later.

Activity Diagrams define flows, error paths, and edge cases visually — the AI can follow them during implementation to cover all branches, not just the happy path.

Step 4c: System Use Cases

Derive System Use Cases from the Persona Use Cases. Where Persona Use Cases describe what actors want, System Use Cases describe what the system does at its technical boundaries — API endpoints, CLI commands, events, file formats.

Derive System Use Cases from the Persona Use Cases.
For each system interface (API endpoint, CLI command, event, file format):
- Trigger and Preconditions (technical: HTTP method, auth token, system state)
- Input: format, validation rules, constraints
- Processing: steps the system performs (reference Business Rules)
- Output: response format, status codes, payload schema
- Error responses: codes, messages, recovery hints
- Non-functional: performance budget, rate limits, timeouts
Use EARS syntax (When/While/If/Shall) for individual requirements
where applicable. Save as src/docs/specs/system-use-cases.adoc.

System Use Cases bridge the gap between stakeholder-facing Persona Use Cases and implementation. They are the input the AI needs to generate correct API handlers, CLI parsers, and event processors without guessing at technical details.

Simon Martinelli’s AI Unified Process calls this the move from Business Use Cases to System Use Case Specifications — the same distinction at a different granularity.

Step 4d: Supplementary Specifications

Not every requirement fits into a use case. Create supplementary specifications as needed based on the project type:

Based on the Use Cases, create supplementary specifications:
- Entity Model: entities, attributes, relationships, constraints (as PlantUML ERD)
- State Machines: for entities with lifecycle behavior (as PlantUML state diagrams)
- Interface Contracts: DTOs and schemas at system boundaries
- Validation Rules: cross-cutting rules not tied to a single use case
Save in src/docs/specs/.

Which supplementary specs you need depends on the project. A CLI tool may only need an Entity Model. A web application typically needs all four. The AI will ask if something is missing during implementation — that feedback loop (Step 8) catches gaps early.

Phase 2: Architecture

Step 5: Create arc42 Architecture Documentation

Ask the AI to derive an architecture from the specification:

Fill the arc42 template in src/docs/arc42/ based on the specification in src/docs/specs/.
Use PlantUML C4 diagrams for architecture visualization.

The arc42 template was downloaded in the prerequisites step. The AI knows the template structure and fills the 12 sections appropriately.

⚓ arc42 provides 12 sections covering everything from context to deployment.

⚓ C4 Diagrams combined with PlantUML provide text-based architecture visualization at four levels: Context, Container, Component, Code. Since the documentation uses AsciiDoc, PlantUML and other text-to-diagram tools are supported natively — the AI generates diagrams as code, and the build renders them automatically.

Architecture Decision Records

Architecture decisions are documented as ⚓ ADRs according to Nygard. Each ADR follows the structure: Title, Status, Context, Decision, Consequences.

For each decision, create a ⚓ Pugh Matrix with a 3-point scale (-1, 0, +1) to evaluate alternatives against quality criteria.

Document this architecture decision as an ADR according to Nygard.
Evaluate alternatives using a Pugh Matrix with a 3-point scale (-1, 0, +1).
Align criteria with the quality goals defined in the arc42 documentation.
Create a GitHub issue for discussion before finalizing.

The AI creates each ADR as a GitHub/GitLab issue first. You review the issue, comment, or approve it. Only after your approval is the ADR incorporated into the arc42 documentation. This way, every architectural decision is traceable through the issue history. All ADRs must align with the quality requirements defined in arc42 Section 10.

Step 6: Architecture Review (ATAM)

Conduct an architecture review using the ⚓ Architecture Tradeoff Analysis Method (ATAM):

Conduct an ATAM review of the architecture in src/docs/arc42/.
Focus on the quality attributes defined in our quality goals.
Document the results as a review report in src/docs/reviews/.

ATAM systematically evaluates architecture against quality attribute scenarios. This step can be repeated after significant architectural changes.

Phase 3: Implementation Planning

Step 7: Create Backlog

Generate implementation issues from the specification:

Create EPICs and User Stories as GitHub issues based on the specification
in src/docs/specs/. Reference the arc42 documentation for technical context.
Follow the INVEST criteria for User Stories.
Use MoSCoW prioritization for the initial backlog order.
Mark dependencies between issues with labels or cross-references.

⚓ INVEST ensures User Stories are Independent, Negotiable, Valuable, Estimable, Small, and Testable.

⚓ MoSCoW (Must have, Should have, Could have, Won’t have) provides clear prioritization.

The initial backlog order follows the EPIC sequence. The AI should document dependencies between issues (e.g. "blocked by #12") so that the implementation order is clear. As the project evolves, groom the backlog regularly to re-prioritize based on new insights.

Phase 4: Implementation Loop

Step 8: Implement Issue by Issue

Create a feature branch for this EPIC.
Select the next logical issue from the backlog (respect dependencies).
Analyze it and document your analysis as a comment on the issue.
Implement it using TDD (choose London or Chicago School as appropriate).
Each test references its Use Case ID for traceability. Commit when done.
Check if the spec or architecture docs need updating.

For each issue:

Analyze: The AI examines the issue, reviews related specs and architecture, checks for dependencies on other issues, and posts an analysis comment on the issue
Implement with TDD: The AI writes tests first, then implementation. Each test references the Use Case ID it verifies (e.g. UC-01). The AI chooses the appropriate mechanism for the language — annotations in Java, comments in JavaScript, docstrings in Python. This creates traceability from tests back to specifications without additional tooling.
Commit: After the issue is implemented and all tests pass, commit with a reference to the issue number
Check docs: Ask whether the specification or architecture documentation needs updating based on what was learned during implementation

TDD (Test-Driven Development) comes in two schools:

⚓ TDD, London School (mockist): isolate the unit under test, mock dependencies. Good for interaction-heavy code.
⚓ TDD, Chicago School (classicist): test behavior through the public API, use real collaborators. Good for state-based logic.

The AI selects the appropriate school based on the code’s characteristics.

Feedback Loop to Specification

During implementation, you will discover gaps in the specification or architecture. This is normal and expected. Periodically ask the AI:

Based on what we learned during implementation, does the specification
or architecture documentation need updating?

Update the docs before continuing. This keeps the specification a living document rather than a stale artifact. The spec is not just a means to generate code — it remains the authoritative description of the system’s behavior, maintained alongside the code for as long as the project lives.

Merging

When an EPIC is complete, the AI agent creates a Pull Request for the feature branch. You review and merge it.

Phase 5: Quality Assurance

Step 9: Code Review

Ask the AI to review the codebase:

Conduct a code review following the Fagan Inspection method.
Focus on correctness, maintainability, and adherence to the specification.
Document findings as a review report in src/docs/reviews/.

Fagan Inspection is a structured, systematic review process with defined roles and phases. It catches defects that testing alone cannot find.

Consider using a different AI model or a fresh AI session for reviews. An AI reviewing its own code has the same blind spots as a developer reviewing their own work.

Step 10: Security Review

Conduct a security review based on the OWASP Top 10.
Document findings and recommendations in src/docs/reviews/.
Create GitHub issues for any vulnerabilities found.

⚓ OWASP Top 10 covers the most critical web application security risks. Even for CLI tools or libraries, the methodology identifies common vulnerability patterns.

Step 11: Exploratory Testing

Once the code reaches V1 status, conduct thorough exploratory testing:

Execute the tool and test it thoroughly. Focus on edge cases, unexpected
inputs, and error handling. The goal is to find bugs, not to confirm
that it works. Create a GitHub issue immediately for each bug you find.

This differs from the TDD tests written during implementation. Those tests verify specified behavior. Exploratory testing deliberately searches for unspecified behavior and edge cases. The AI executes the tool directly and interacts with it as a real user would.

Step 12: Bug Fix Loop

Analyze and fix each bug found during exploratory testing. After fixing, return to Step 11 and test again. Repeat until no more bugs are found.

Each fix follows the same TDD discipline: write a failing test that reproduces the bug, then fix it.

Adapting This Workflow

This workflow is optimized for greenfield projects with a single developer and AI assistance. Some considerations for other contexts:

Team projects: Use Pull Requests instead of direct merges. ADR discussions in GitHub issues become more valuable with multiple reviewers. Consider assigning different AI models to different review roles.
Larger projects: Break the work into multiple feature branches per EPIC. Run ATAM reviews after each major architectural change, not just once. Groom the backlog more frequently as the project grows.
Legacy codebases: This workflow assumes you start from scratch. See Adapting the Workflow to Brownfield Projects for a dedicated guide covering reverse engineering, baseline test coverage, and incremental migration using bounded contexts.
Frontend / HTML applications: This workflow works well for backend code, CLIs, and libraries where the compiler and TDD provide strong error correction. Frontend applications can be built with the same approach, but consistent UI is harder to verify automatically. The error correction layers that make backend development reliable (compiler errors, test failures) have no direct equivalent for visual consistency.

To compensate, start with a design system or component library and test individual components before full-page integration. Add visual regression testing with Playwright screenshot comparisons to your CI pipeline. This adds significant effort compared to backend projects, but without it the AI tends to produce functional but visually inconsistent results.

Prompt Cheat Sheet

The essential one-liners for each phase, with the Semantic Anchors they activate:

Phase Prompt Anchors

Requirements

Use the Socratic Method combined with MECE to clarify requirements for [project]. Max 3 questions at a time.

Socratic Method, MECE

PRD

Write a PRD based on our discussion. Save as src/docs/specs/prd.adoc. Follow Strunk & White.

Strunk & White

Persona Use Cases

Create Persona Use Cases in Cockburn’s Fully Dressed format (Actor, Trigger, Main Flow, Extensions, Postconditions, Business Rules). Add Activity Diagrams and Gherkin acceptance criteria.

Cockburn, Gherkin

System Use Cases

Derive System Use Cases from Persona Use Cases. For each interface: input/validation, processing, output/status codes, error responses. EARS syntax where applicable.

EARS

Supplementary Specs

Create Entity Model (PlantUML ERD), State Machines, Interface Contracts (DTOs), Validation Rules as needed.

—

Architecture

Create arc42 docs with PlantUML C4 diagrams based on the spec. Save in src/docs/arc42/.

arc42, C4 Diagrams

ADR

Document as ADR according to Nygard. Evaluate with Pugh Matrix (-1/0/+1). Create GitHub issue first.

ADR (Nygard), Pugh Matrix

ATAM

Conduct an ATAM review. Focus on our quality goals. Save report in src/docs/reviews/.

ATAM

Backlog

Create EPICs and User Stories as GitHub issues. INVEST criteria. MoSCoW prioritization. Mark dependencies.

INVEST, MoSCoW

Implementation

Select next issue (respect dependencies). Analyze and comment. Implement with TDD. Each test references its Use Case ID. Commit. Check if docs need updating.

TDD London / Chicago, SOLID, DRY, DDD

Code Review

Conduct a Fagan Inspection. Focus on correctness and spec adherence. Save report.

Fagan Inspection

Security

Security review based on OWASP Top 10. Create issues for findings.

OWASP Top 10

Testing

Execute the tool. Find bugs, especially edge cases. Create an issue for each bug immediately.

—

Cross-cutting anchors that apply to all phases: Docs-as-Code (Ralf D. Müller), Conventional Commits, Plain English / Strunk & White.

A Note on Tooling

This document describes a plain workflow using standard prompts. You will notice that none of the prompts use role-playing patterns like "You are an expert tester" or "Act as a senior architect." That is intentional. Semantic Anchors activate the right knowledge domain without pretending the model is something it is not. "Conduct a Fagan Inspection" is more precise than "You are a world-class code reviewer."

In practice, the workflow benefits greatly from specialized tooling. MCP servers like Responsible Vibe MCP enforce agentic workflows with phase gates and review checkpoints. Serena provides semantic code navigation so the agent can work with symbols and references instead of raw text. Skill systems like Superpowers encode reusable process patterns like TDD, debugging, and code review workflows. Playwright MCP lets the agent control a browser for end-to-end testing and exploratory testing of web applications. These tools automate the discipline this document describes manually.

The goal of this document is transparency. Every step is visible, every prompt is reproducible, and nothing depends on proprietary tooling. Once you understand the process, adopt whatever tools make it easier to follow.

Conclusion

This workflow demonstrates how Semantic Anchors make AI-assisted development systematic and repeatable. Each prompt stays short because the anchor activates the full concept. The error correction layers (specification, architecture review, TDD, code review, security review, exploratory testing) stack up to make the noisy LLM channel reliable.

You do not need to be an expert in every methodology listed here. The AI knows these concepts. Your job is to know which anchor to use when, and to verify the results.

Try it on your next project. Start with the Socratic Method for requirements and see where it leads. If you discover anchors that work well in your workflow, contribute them to the collection.

Once you are comfortable with this greenfield workflow, you can adapt it to existing codebases. See Adapting the Workflow to Brownfield Projects for a step-by-step guide.