Spec-Driven Development with Semantic Anchors
Introduction
This document describes how to build production-quality software with AI agents, guided by Semantic Anchors.
Semantic Anchors are compact terms that reliably activate rich knowledge domains in LLMs. Instead of writing pages of instructions, you reference a concept the model already understands deeply. "Use TDD, London School" is shorter than explaining test-driven development with mocks and outside-in design. "Follow arc42" is shorter than describing 12 architecture sections. The prompts stay short, precise, and maintainable.
This workflow was used to build three open source projects, all 100% AI-generated. The golden rule: I only prompt, I never touch the code myself. Every line of code, every test, every documentation file was written by the AI under my guidance.
-
dacli: A full CLI tool with spec, architecture docs, tests, and user manual. Built by one AI, cross-reviewed by 5 different LLMs.
-
Semantic Anchors Website: 228+ GitHub stars, documentation, and a video series.
-
Vibe Coding Risk Radar: An interactive web app for assessing AI coding risks.
|
Semantic Anchors marked with ⚓ are highlighted throughout this document. Click on any anchor to see its full definition on the Semantic Anchors website. |
|
This workflow is designed for greenfield projects built from scratch with AI assistance. For existing codebases, see Adapting the Workflow to Brownfield Projects. |
The Key Principle: Small Steps, High Autonomy
The most common mistake with AI coding is prompting "Build me a CLI tool" and expecting a working result. That is like asking a junior developer to build an entire application from a one-sentence briefing. The result will be unpredictable at best.
This workflow takes the opposite approach: break the work into small, well-defined steps and let the AI handle each one autonomously. Each step produces a concrete artifact that you can review if you see the need for it.
The paradox: the smaller you make each task, the more autonomy you can give the agent. A vague "build me X" needs constant supervision. A precise "implement issue #42 using TDD, respecting the spec and architecture in src/docs/" can run on its own. The phases described in this document are designed to produce exactly that kind of precise, self-contained task.
This connects directly to Eichhorst’s Principle, which applies Shannon’s noisy channel theorem to LLM coding. An LLM is not a deterministic tool. It is a noisy, non-deterministic channel. It hallucinates, loses context, and is sometimes plain wrong. But an agent in a loop corrects itself: the compiler reports an error, the agent reads it, fixes the code, runs the tests, reads the failure, fixes the logic, and repeats until green. That is not magic — that is error correction, exactly as Shannon described.
When you prompt an LLM and paste the result into your project, you run an open loop. No compiler check, no test suite, no review. The LLM guesses once and you hope it guessed right. When an agent writes code, runs the compiler, runs the tests, and iterates until everything passes, that is a closed loop. The same principle that makes a thermostat work.
Different tests correct different error classes. The compiler catches syntax errors. Unit tests catch logic errors. BDD tests catch domain errors. Each layer increases the reliability of the channel. Untested code is an uncorrected channel — the noise passes straight through.
The consequence: better tests beat better prompts. A comprehensive test suite turns a mediocre model into a reliable coding partner. And if the complexity of a specification exceeds the capacity of the LLM, more tokens will not help. The answer is smaller specifications, clearer boundaries, and better tests.
Each small step in this workflow is a short transmission over the noisy channel. Short transmissions with error correction are far more reliable than one long, unchecked transmission.
Cross-Cutting Concerns
These principles apply to all phases:
- ⚓ Plain English according to Strunk & White
-
All documentation uses short sentences, active voice, and no unnecessary words.
- ⚓ Conventional Commits
-
All commits follow a standardized format for a clean, parseable git history.
- ⚓ Docs-as-Code according to Ralf D. Müller
-
Documentation lives in the repository as AsciiDoc, built by docToolchain. Docs-as-Code treats documentation like source code: version-controlled, peer-reviewed, and built automatically.
- ⚓ Definition of Done
-
Code passes all tests, feature branch is merged or PR is created, documentation is updated, architecture decisions are recorded.
Prerequisites
Before starting, set up your project infrastructure:
-
Initialize a git repository
-
Install docToolchain and download the ⚓ arc42 template
-
Configure your AI coding environment with an
AGENTS.md(or tool-specific equivalent likeCLAUDE.md) -
Give the AI agent access to GitHub or GitLab via the CLI (
ghorglab). The agent will need this later to create issues, pull requests, and ADR discussions. Consider using a dedicated account for audit traceability. -
Following Eichhorst’s Principle, set up error correction layers for your project: linters, pre-commit hooks, CI pipelines, and static analysis. Each layer catches a different class of error and makes the LLM channel more reliable. The Vibe Coding Risk Radar can help determine which checks are appropriate for your project’s risk profile. These checks unfold their full effect once the first lines of runnable code exist — set them up early, but revisit as the project grows.
cd <your-project>
curl -Lo dtcw https://doctoolchain.org/dtcw
chmod +x dtcw
./dtcw docker downloadTemplate
See the official installation guide for details and other platforms.
AGENTS.md as Project Memory
AGENTS.md is an open standard for guiding AI coding agents.
The file lives in your repository root and is read automatically at the start of every session.
It serves as the project memory: coding conventions, architectural decisions, file structure, and pointers to important documents.
Most AI coding tools support it or have an equivalent (Claude Code uses CLAUDE.md, for example).
A minimal AGENTS.md for this workflow:
# Project: <your project name>
## Key Documents
- PRD: src/docs/specs/prd.adoc
- Specification: src/docs/specs/
- Architecture: src/docs/arc42/
- Reviews: src/docs/reviews/
## Conventions
- Documentation: Plain English according to Strunk & White
- Testing: TDD (London or Chicago School as appropriate)
- Code: DRY, SOLID, KISS, Ubiquitous Language (DDD)
- Commits: Conventional Commits, reference issue number
- Branches: feature/<issue-description>
As the project progresses, the AI agent will maintain this file itself.
When starting a new AI session, the agent reads AGENTS.md and immediately has the context it needs.
|
Compact the context before starting a new EPIC.
Within a session, keep an eye on the context window.
Compact the conversation manually at natural breakpoints (e.g. after completing an issue) rather than waiting for the model to auto-compact at an inconvenient moment and lose important context.
The agent picks up context from |
Phase 1: Requirements Discovery
Step 1: Describe Your Vision
Start by explaining your idea to the AI in your own words. This is the one place where more is better. Cover:
-
The problem you want to solve
-
Who will use it (target audience)
-
What the desired outcome looks like
-
Constraints: budget, timeline, tech stack preferences, platform
-
What it is not: explicit boundaries help the AI avoid scope creep
-
Inspiration: similar tools or approaches you have seen
Cover these aspects in whatever order feels natural. The next step will bring structure to your raw ideas.
Step 2: Clarify Requirements with the Socratic Method
Prompt the AI to use the ⚓ Socratic Method to clarify requirements.
Use the Socratic Method to help me clarify requirements for [your project].
Ask me at most 3 questions at a time. Challenge my assumptions.
Keep asking until you fully understand the requirements.
This activates targeted questioning, assumption challenging, and productive use of not-knowing. The constraint "at most 3 questions" prevents question overload.
You can layer additional anchors for structured coverage:
Use the Socratic Method combined with MECE to clarify requirements.
⚓ MECE (Mutually Exclusive, Collectively Exhaustive) ensures questions cover all areas without overlap.
Continue the dialogue until both you and the AI are satisfied that the requirements are clear.
Step 3: Document as PRD
Ask the AI to write a ⚓ Product Requirements Document (PRD) and save it as AsciiDoc. A PRD captures the what and why, not the how: problem statement, goals, user personas, success criteria, and scope boundaries.
Write a PRD based on our discussion. Save it as src/docs/specs/prd.adoc.
Step 4: Create Detailed Specification
Step 4 builds the specification in layers: from scope discovery (Actor-Goal List) through persona-level use cases to technical system specifications and supplementary models.
Step 4a: Discover Scope with the Actor-Goal List
Start by discovering scope with an ⚓ Actor-Goal List:
Create an Actor-Goal List from the PRD.
For each actor, list every goal they have against the system.
Apply the Goal Level test: "Does the actor go home happy if this goal
is achieved?" — if yes, it's a User Goal. If not, it's a Subfunction
(extract only when reused). Save as src/docs/specs/actor-goal-list.adoc.
Step 4b: Persona Use Cases
Generate use cases at User Goal level from the Actor-Goal List. These describe what actors want to achieve and how the system responds — in prose, at a level that stakeholders can review.
Create Persona Use Cases from the PRD and Actor-Goal List.
For each User Goal, write a Use Case in Cockburn's Fully Dressed format:
- Primary Actor and Stakeholders & Interests
- Trigger, Preconditions
- Main Success Scenario (numbered steps)
- Extensions (alternative/failure paths, referencing step numbers)
- Postconditions (Success Guarantee and Minimal Guarantee)
- Business Rules (BR-001, BR-002, ...)
Then add for each Use Case:
- Activity Diagram covering all flows (not just the happy path)
- Acceptance criteria in Gherkin format (Given/When/Then)
Save as .adoc files in src/docs/specs/.
Each Use Case in ⚓ Cockburn’s Fully Dressed format defines:
-
Primary Actor: Who initiates the use case and has the goal.
-
Stakeholders & Interests: Who else cares about the outcome and what they need from it.
-
Trigger: The specific event that starts the use case ("User clicks Submit", "System receives webhook"). Without a trigger, neither the AI nor a tester knows when the use case begins.
-
Main Success Scenario: The numbered steps of the happy path, each describing observable system behavior.
-
Extensions: Named branches (3a, 4b, …) for error handling, validation failures, and edge cases — referencing the step they branch from.
-
Postconditions: The guaranteed system state after successful completion (Success Guarantee) and the minimum guarantee even when the use case fails (Minimal Guarantee). Postconditions are what your tests assert.
-
Business Rules: Numbered rules (BR-001, BR-002, …) that capture validation constraints, calculation logic, or domain invariants. Business rules make implicit knowledge explicit. Without them, the AI invents its own validation logic — or skips it entirely.
The more precise the use case, the less the AI has to guess during implementation. Cockburn’s format is deliberately prose-based — it does not prescribe any specific notation. Activity Diagrams and Gherkin are complementary representations we layer on top:
⚓ Gherkin (Given/When/Then) provides acceptance criteria that are both human-readable and machine-testable. These criteria become the foundation for TDD later.
Activity Diagrams define flows, error paths, and edge cases visually — the AI can follow them during implementation to cover all branches, not just the happy path.
Step 4c: System Use Cases
Derive System Use Cases from the Persona Use Cases. Where Persona Use Cases describe what actors want, System Use Cases describe what the system does at its technical boundaries — API endpoints, CLI commands, events, file formats.
Derive System Use Cases from the Persona Use Cases.
For each system interface (API endpoint, CLI command, event, file format):
- Trigger and Preconditions (technical: HTTP method, auth token, system state)
- Input: format, validation rules, constraints
- Processing: steps the system performs (reference Business Rules)
- Output: response format, status codes, payload schema
- Error responses: codes, messages, recovery hints
- Non-functional: performance budget, rate limits, timeouts
Use EARS syntax (When/While/If/Shall) for individual requirements
where applicable. Save as src/docs/specs/system-use-cases.adoc.
System Use Cases bridge the gap between stakeholder-facing Persona Use Cases and implementation. They are the input the AI needs to generate correct API handlers, CLI parsers, and event processors without guessing at technical details.
Simon Martinelli’s AI Unified Process calls this the move from Business Use Cases to System Use Case Specifications — the same distinction at a different granularity.
Step 4d: Supplementary Specifications
Not every requirement fits into a use case. Create supplementary specifications as needed based on the project type:
Based on the Use Cases, create supplementary specifications:
- Entity Model: entities, attributes, relationships, constraints (as PlantUML ERD)
- State Machines: for entities with lifecycle behavior (as PlantUML state diagrams)
- Interface Contracts: DTOs and schemas at system boundaries
- Validation Rules: cross-cutting rules not tied to a single use case
Save in src/docs/specs/.
Which supplementary specs you need depends on the project. A CLI tool may only need an Entity Model. A web application typically needs all four. The AI will ask if something is missing during implementation — that feedback loop (Step 8) catches gaps early.
Phase 2: Architecture
Step 5: Create arc42 Architecture Documentation
Ask the AI to derive an architecture from the specification:
Fill the arc42 template in src/docs/arc42/ based on the specification in src/docs/specs/.
Use PlantUML C4 diagrams for architecture visualization.
The arc42 template was downloaded in the prerequisites step. The AI knows the template structure and fills the 12 sections appropriately.
⚓ arc42 provides 12 sections covering everything from context to deployment.
⚓ C4 Diagrams combined with PlantUML provide text-based architecture visualization at four levels: Context, Container, Component, Code. Since the documentation uses AsciiDoc, PlantUML and other text-to-diagram tools are supported natively — the AI generates diagrams as code, and the build renders them automatically.
Architecture Decision Records
Architecture decisions are documented as ⚓ ADRs according to Nygard. Each ADR follows the structure: Title, Status, Context, Decision, Consequences.
For each decision, create a ⚓ Pugh Matrix with a 3-point scale (-1, 0, +1) to evaluate alternatives against quality criteria.
Document this architecture decision as an ADR according to Nygard.
Evaluate alternatives using a Pugh Matrix with a 3-point scale (-1, 0, +1).
Align criteria with the quality goals defined in the arc42 documentation.
Create a GitHub issue for discussion before finalizing.
The AI creates each ADR as a GitHub/GitLab issue first. You review the issue, comment, or approve it. Only after your approval is the ADR incorporated into the arc42 documentation. This way, every architectural decision is traceable through the issue history. All ADRs must align with the quality requirements defined in arc42 Section 10.
Step 6: Architecture Review (ATAM)
Conduct an architecture review using the ⚓ Architecture Tradeoff Analysis Method (ATAM):
Conduct an ATAM review of the architecture in src/docs/arc42/.
Focus on the quality attributes defined in our quality goals.
Document the results as a review report in src/docs/reviews/.
ATAM systematically evaluates architecture against quality attribute scenarios. This step can be repeated after significant architectural changes.
Phase 3: Implementation Planning
Step 7: Create Backlog
Generate implementation issues from the specification:
Create EPICs and User Stories as GitHub issues based on the specification
in src/docs/specs/. Reference the arc42 documentation for technical context.
Follow the INVEST criteria for User Stories.
Use MoSCoW prioritization for the initial backlog order.
Mark dependencies between issues with labels or cross-references.
⚓ INVEST ensures User Stories are Independent, Negotiable, Valuable, Estimable, Small, and Testable.
⚓ MoSCoW (Must have, Should have, Could have, Won’t have) provides clear prioritization.
The initial backlog order follows the EPIC sequence. The AI should document dependencies between issues (e.g. "blocked by #12") so that the implementation order is clear. As the project evolves, groom the backlog regularly to re-prioritize based on new insights.
Phase 4: Implementation Loop
Step 8: Implement Issue by Issue
Create a feature branch for this EPIC.
Select the next logical issue from the backlog (respect dependencies).
Analyze it and document your analysis as a comment on the issue.
Implement it using TDD (choose London or Chicago School as appropriate).
Each test references its Use Case ID for traceability. Commit when done.
Check if the spec or architecture docs need updating.
For each issue:
-
Analyze: The AI examines the issue, reviews related specs and architecture, checks for dependencies on other issues, and posts an analysis comment on the issue
-
Implement with TDD: The AI writes tests first, then implementation. Each test references the Use Case ID it verifies (e.g.
UC-01). The AI chooses the appropriate mechanism for the language — annotations in Java, comments in JavaScript, docstrings in Python. This creates traceability from tests back to specifications without additional tooling. -
Commit: After the issue is implemented and all tests pass, commit with a reference to the issue number
-
Check docs: Ask whether the specification or architecture documentation needs updating based on what was learned during implementation
TDD (Test-Driven Development) comes in two schools:
-
⚓ TDD, London School (mockist): isolate the unit under test, mock dependencies. Good for interaction-heavy code.
-
⚓ TDD, Chicago School (classicist): test behavior through the public API, use real collaborators. Good for state-based logic.
The AI selects the appropriate school based on the code’s characteristics.
Feedback Loop to Specification
During implementation, you will discover gaps in the specification or architecture. This is normal and expected. Periodically ask the AI:
Based on what we learned during implementation, does the specification
or architecture documentation need updating?
Update the docs before continuing. This keeps the specification a living document rather than a stale artifact. The spec is not just a means to generate code — it remains the authoritative description of the system’s behavior, maintained alongside the code for as long as the project lives.
Phase 5: Quality Assurance
Step 9: Code Review
Ask the AI to review the codebase:
Conduct a code review following the Fagan Inspection method.
Focus on correctness, maintainability, and adherence to the specification.
Document findings as a review report in src/docs/reviews/.
Fagan Inspection is a structured, systematic review process with defined roles and phases. It catches defects that testing alone cannot find.
|
Consider using a different AI model or a fresh AI session for reviews. An AI reviewing its own code has the same blind spots as a developer reviewing their own work. |
Step 10: Security Review
Conduct a security review based on the OWASP Top 10.
Document findings and recommendations in src/docs/reviews/.
Create GitHub issues for any vulnerabilities found.
⚓ OWASP Top 10 covers the most critical web application security risks. Even for CLI tools or libraries, the methodology identifies common vulnerability patterns.
Step 11: Exploratory Testing
Once the code reaches V1 status, conduct thorough exploratory testing:
Execute the tool and test it thoroughly. Focus on edge cases, unexpected
inputs, and error handling. The goal is to find bugs, not to confirm
that it works. Create a GitHub issue immediately for each bug you find.
This differs from the TDD tests written during implementation. Those tests verify specified behavior. Exploratory testing deliberately searches for unspecified behavior and edge cases. The AI executes the tool directly and interacts with it as a real user would.
Adapting This Workflow
This workflow is optimized for greenfield projects with a single developer and AI assistance. Some considerations for other contexts:
- Team projects
-
Use Pull Requests instead of direct merges. ADR discussions in GitHub issues become more valuable with multiple reviewers. Consider assigning different AI models to different review roles.
- Larger projects
-
Break the work into multiple feature branches per EPIC. Run ATAM reviews after each major architectural change, not just once. Groom the backlog more frequently as the project grows.
- Legacy codebases
-
This workflow assumes you start from scratch. See Adapting the Workflow to Brownfield Projects for a dedicated guide covering reverse engineering, baseline test coverage, and incremental migration using bounded contexts.
- Frontend / HTML applications
-
This workflow works well for backend code, CLIs, and libraries where the compiler and TDD provide strong error correction. Frontend applications can be built with the same approach, but consistent UI is harder to verify automatically. The error correction layers that make backend development reliable (compiler errors, test failures) have no direct equivalent for visual consistency.
To compensate, start with a design system or component library and test individual components before full-page integration. Add visual regression testing with Playwright screenshot comparisons to your CI pipeline. This adds significant effort compared to backend projects, but without it the AI tends to produce functional but visually inconsistent results.
Prompt Cheat Sheet
The essential one-liners for each phase, with the Semantic Anchors they activate:
| Phase | Prompt | Anchors |
|---|---|---|
Requirements |
|
|
PRD |
|
|
Persona Use Cases |
|
|
System Use Cases |
|
|
Supplementary Specs |
|
— |
Architecture |
|
|
ADR |
|
|
ATAM |
|
|
Backlog |
|
INVEST, MoSCoW |
Implementation |
|
TDD London / Chicago, SOLID, DRY, DDD |
Code Review |
|
Fagan Inspection |
Security |
|
|
Testing |
|
— |
Cross-cutting anchors that apply to all phases: Docs-as-Code (Ralf D. Müller), Conventional Commits, Plain English / Strunk & White.
A Note on Tooling
This document describes a plain workflow using standard prompts. You will notice that none of the prompts use role-playing patterns like "You are an expert tester" or "Act as a senior architect." That is intentional. Semantic Anchors activate the right knowledge domain without pretending the model is something it is not. "Conduct a Fagan Inspection" is more precise than "You are a world-class code reviewer."
In practice, the workflow benefits greatly from specialized tooling. MCP servers like Responsible Vibe MCP enforce agentic workflows with phase gates and review checkpoints. Serena provides semantic code navigation so the agent can work with symbols and references instead of raw text. Skill systems like Superpowers encode reusable process patterns like TDD, debugging, and code review workflows. Playwright MCP lets the agent control a browser for end-to-end testing and exploratory testing of web applications. These tools automate the discipline this document describes manually.
The goal of this document is transparency. Every step is visible, every prompt is reproducible, and nothing depends on proprietary tooling. Once you understand the process, adopt whatever tools make it easier to follow.
Conclusion
This workflow demonstrates how Semantic Anchors make AI-assisted development systematic and repeatable. Each prompt stays short because the anchor activates the full concept. The error correction layers (specification, architecture review, TDD, code review, security review, exploratory testing) stack up to make the noisy LLM channel reliable.
You do not need to be an expert in every methodology listed here. The AI knows these concepts. Your job is to know which anchor to use when, and to verify the results.
Try it on your next project. Start with the Socratic Method for requirements and see where it leads. If you discover anchors that work well in your workflow, contribute them to the collection.
Once you are comfortable with this greenfield workflow, you can adapt it to existing codebases. See Adapting the Workflow to Brownfield Projects for a step-by-step guide.
Further Reading
-
Birgitta Böckeler, Exploring Gen AI (martinfowler.com) — a critical analysis of spec-driven development tools (Kiro, spec-kit, Tessl) and their trade-offs. Examines where elaborate upfront specifications help and where they create overhead.
-
Simon Martinelli, AI Unified Process — a requirements-driven methodology combining Rational Unified Process principles with AI tooling. Treats AI as a consistency engine that regenerates code from evolving specifications, with four phases: Inception, Elaboration, Construction, Transition.
-
Ralf D. Müller & Simon Martinelli, Spec-Driven Development (software-architektur.tv, Episode 298) — podcast discussion on how specifications and requirements take center stage in AI-assisted development, and why iterative refinement beats perfect upfront specs.