The Semantic Anchors Development Workflow
Introduction
This document describes how to build production-quality software with AI agents, guided by Semantic Anchors.
Semantic Anchors are compact terms that reliably activate rich knowledge domains in LLMs. Instead of writing pages of instructions, you reference a concept the model already understands deeply. "Use TDD, London School" is shorter than explaining test-driven development with mocks and outside-in design. "Follow arc42" is shorter than describing 12 architecture sections. The prompts stay short, precise, and maintainable.
This workflow was used to build three open source projects, all 100% AI-generated. The golden rule: I only prompt, I never touch the code myself. Every line of code, every test, every documentation file was written by the AI under my guidance.
-
dacli: A full CLI tool with spec, architecture docs, tests, and user manual. Built by one AI, cross-reviewed by 5 different LLMs.
-
Semantic Anchors Website: 228+ GitHub stars, documentation, and a video series.
-
Vibe Coding Risk Radar: An interactive web app for assessing AI coding risks.
|
Semantic Anchors marked with ⚓ are highlighted throughout this document. Click on any anchor to see its full definition on the Semantic Anchors website. |
|
This workflow is designed for greenfield projects built from scratch with AI assistance. For existing codebases, see Adapting the Workflow to Brownfield Projects. |
The Key Principle: Small Steps, High Autonomy
The most common mistake with AI coding is prompting "Build me a CLI tool" and expecting a working result. That is like asking a junior developer to build an entire application from a one-sentence briefing. The result will be unpredictable at best.
This workflow takes the opposite approach: break the work into small, well-defined steps and let the AI handle each one autonomously. Each step produces a concrete artifact that you can review if you see the need for it.
The paradox: the smaller you make each task, the more autonomy you can give the agent. A vague "build me X" needs constant supervision. A precise "implement issue #42 using TDD, respecting the spec and architecture in src/docs/" can run on its own. The phases described in this document are designed to produce exactly that kind of precise, self-contained task.
This connects directly to Eichhorst’s Principle, which applies Shannon’s noisy channel theorem to LLM coding. An LLM is not a deterministic tool. It is a noisy, non-deterministic channel. It hallucinates, loses context, and is sometimes plain wrong. But an agent in a loop corrects itself: the compiler reports an error, the agent reads it, fixes the code, runs the tests, reads the failure, fixes the logic, and repeats until green. That is not magic — that is error correction, exactly as Shannon described.
When you prompt an LLM and paste the result into your project, you run an open loop. No compiler check, no test suite, no review. The LLM guesses once and you hope it guessed right. When an agent writes code, runs the compiler, runs the tests, and iterates until everything passes, that is a closed loop. The same principle that makes a thermostat work.
Different tests correct different error classes. The compiler catches syntax errors. Unit tests catch logic errors. BDD tests catch domain errors. Each layer increases the reliability of the channel. Untested code is an uncorrected channel — the noise passes straight through.
The consequence: better tests beat better prompts. A comprehensive test suite turns a mediocre model into a reliable coding partner. And if the complexity of a specification exceeds the capacity of the LLM, more tokens will not help. The answer is smaller specifications, clearer boundaries, and better tests.
Each small step in this workflow is a short transmission over the noisy channel. Short transmissions with error correction are far more reliable than one long, unchecked transmission.
Cross-Cutting Concerns
These principles apply to all phases:
- ⚓ Plain English according to Strunk & White
-
All documentation uses short sentences, active voice, and no unnecessary words.
- ⚓ Conventional Commits
-
All commits follow a standardized format for a clean, parseable git history.
- ⚓ Docs-as-Code according to Ralf D. Müller
-
Documentation lives in the repository as AsciiDoc, built by docToolchain. Docs-as-Code treats documentation like source code: version-controlled, peer-reviewed, and built automatically.
- ⚓ Definition of Done
-
Code passes all tests, feature branch is merged or PR is created, documentation is updated, architecture decisions are recorded.
Prerequisites
Before starting, set up your project infrastructure:
-
Initialize a git repository
-
Install docToolchain and download the ⚓ arc42 template
-
Configure your AI coding environment with an
AGENTS.md(or tool-specific equivalent likeCLAUDE.md) -
Give the AI agent access to GitHub or GitLab via the CLI (
ghorglab). The agent will need this later to create issues, pull requests, and ADR discussions. Consider using a dedicated account for audit traceability. -
Following Eichhorst’s Principle, set up error correction layers for your project: linters, pre-commit hooks, CI pipelines, and static analysis. Each layer catches a different class of error and makes the LLM channel more reliable. The Vibe Coding Risk Radar can help determine which checks are appropriate for your project’s risk profile. These checks unfold their full effect once the first lines of runnable code exist — set them up early, but revisit as the project grows.
cd <your-project>
curl -Lo dtcw https://doctoolchain.org/dtcw
chmod +x dtcw
./dtcw docker downloadTemplate
See the official installation guide for details and other platforms.
AGENTS.md as Project Memory
AGENTS.md is an open standard for guiding AI coding agents.
The file lives in your repository root and is read automatically at the start of every session.
It serves as the project memory: coding conventions, architectural decisions, file structure, and pointers to important documents.
Most AI coding tools support it or have an equivalent (Claude Code uses CLAUDE.md, for example).
A minimal AGENTS.md for this workflow:
# Project: <your project name>
## Key Documents
- PRD: src/docs/specs/prd.adoc
- Specification: src/docs/specs/
- Architecture: src/docs/arc42/
- Reviews: src/docs/reviews/
## Conventions
- Documentation: Plain English according to Strunk & White
- Testing: TDD (London or Chicago School as appropriate)
- Code: DRY, SOLID, KISS, Ubiquitous Language (DDD)
- Commits: Conventional Commits, reference issue number
- Branches: feature/<issue-description>
As the project progresses, the AI agent will maintain this file itself.
When starting a new AI session, the agent reads AGENTS.md and immediately has the context it needs.
|
Compact the context before starting a new EPIC.
Within a session, keep an eye on the context window.
Compact the conversation manually at natural breakpoints (e.g. after completing an issue) rather than waiting for the model to auto-compact at an inconvenient moment and lose important context.
The agent picks up context from |
Phase 1: Requirements Discovery
Step 1: Describe Your Vision
Start by explaining your idea to the AI in your own words. This is the one place where more is better. Cover:
-
The problem you want to solve
-
Who will use it (target audience)
-
What the desired outcome looks like
-
Constraints: budget, timeline, tech stack preferences, platform
-
What it is not: explicit boundaries help the AI avoid scope creep
-
Inspiration: similar tools or approaches you have seen
Cover these aspects in whatever order feels natural. The next step will bring structure to your raw ideas.
Step 2: Clarify Requirements with the Socratic Method
Prompt the AI to use the ⚓ Socratic Method to clarify requirements.
Use the Socratic Method to help me clarify requirements for [your project].
Ask me at most 3 questions at a time. Challenge my assumptions.
Keep asking until you fully understand the requirements.
This activates targeted questioning, assumption challenging, and productive use of not-knowing. The constraint "at most 3 questions" prevents question overload.
You can layer additional anchors for structured coverage:
Use the Socratic Method combined with MECE to clarify requirements.
⚓ MECE (Mutually Exclusive, Collectively Exhaustive) ensures questions cover all areas without overlap.
Continue the dialogue until both you and the AI are satisfied that the requirements are clear.
Step 3: Document as PRD
Ask the AI to write a ⚓ Product Requirements Document (PRD) and save it as AsciiDoc. A PRD captures the what and why, not the how: problem statement, goals, user personas, success criteria, and scope boundaries.
Write a PRD based on our discussion. Save it as src/docs/specs/prd.adoc.
Step 4: Create Detailed Specification
From the PRD, generate a full specification:
Create a detailed specification from the PRD. Include:
- Use Cases (Trigger, Main Flow, Alternative Flows, Postconditions, Business Rules)
- Activity Diagrams for all flows (not just the happy path)
- Acceptance criteria in Gherkin format
Save as .adoc files in src/docs/specs/.
Each Use Case must define four elements:
-
Trigger: The specific event that starts the use case ("User clicks Submit", "System receives webhook"). Without a trigger, neither the AI nor a tester knows when the use case begins.
-
Main Flow: The numbered steps of the happy path, each describing observable system behavior.
-
Alternative Flows: Named branches (A1, A2, …) for error handling, validation failures, and edge cases.
-
Postconditions: The guaranteed system state after successful completion. Postconditions are what your tests assert.
-
Business Rules: Numbered rules (BR-001, BR-002, …) that capture validation constraints, calculation logic, or domain invariants. Business rules make implicit knowledge explicit. Without them, the AI invents its own validation logic — or skips it entirely.
This structure follows Alistair Cockburn’s Use Case format, which LLMs recognize reliably. The more precise the use case, the less the AI has to guess during implementation.
⚓ Gherkin (Given/When/Then) provides acceptance criteria that are both human-readable and machine-testable. These criteria become the foundation for TDD later.
Activity Diagrams are an important part of the specification because they define flows, error paths, and edge cases in a way the AI can follow during implementation.
Phase 2: Architecture
Step 5: Create arc42 Architecture Documentation
Ask the AI to derive an architecture from the specification:
Fill the arc42 template in src/docs/arc42/ based on the specification in src/docs/specs/.
Use PlantUML C4 diagrams for architecture visualization.
The arc42 template was downloaded in the prerequisites step. The AI knows the template structure and fills the 12 sections appropriately.
⚓ arc42 provides 12 sections covering everything from context to deployment.
⚓ C4 Diagrams combined with PlantUML provide text-based architecture visualization at four levels: Context, Container, Component, Code. Since the documentation uses AsciiDoc, PlantUML and other text-to-diagram tools are supported natively — the AI generates diagrams as code, and the build renders them automatically.
Architecture Decision Records
Architecture decisions are documented as ⚓ ADRs according to Nygard. Each ADR follows the structure: Title, Status, Context, Decision, Consequences.
For each decision, create a ⚓ Pugh Matrix with a 3-point scale (-1, 0, +1) to evaluate alternatives against quality criteria.
Document this architecture decision as an ADR according to Nygard.
Evaluate alternatives using a Pugh Matrix with a 3-point scale (-1, 0, +1).
Align criteria with the quality goals defined in the arc42 documentation.
Create a GitHub issue for discussion before finalizing.
The AI creates each ADR as a GitHub/GitLab issue first. You review the issue, comment, or approve it. Only after your approval is the ADR incorporated into the arc42 documentation. This way, every architectural decision is traceable through the issue history. All ADRs must align with the quality requirements defined in arc42 Section 10.
Step 6: Architecture Review (ATAM)
Conduct an architecture review using the ⚓ Architecture Tradeoff Analysis Method (ATAM):
Conduct an ATAM review of the architecture in src/docs/arc42/.
Focus on the quality attributes defined in our quality goals.
Document the results as a review report in src/docs/reviews/.
ATAM systematically evaluates architecture against quality attribute scenarios. This step can be repeated after significant architectural changes.
Phase 3: Implementation Planning
Step 7: Create Backlog
Generate implementation issues from the specification:
Create EPICs and User Stories as GitHub issues based on the specification
in src/docs/specs/. Reference the arc42 documentation for technical context.
Follow the INVEST criteria for User Stories.
Use MoSCoW prioritization for the initial backlog order.
Mark dependencies between issues with labels or cross-references.
⚓ INVEST ensures User Stories are Independent, Negotiable, Valuable, Estimable, Small, and Testable.
⚓ MoSCoW (Must have, Should have, Could have, Won’t have) provides clear prioritization.
The initial backlog order follows the EPIC sequence. The AI should document dependencies between issues (e.g. "blocked by #12") so that the implementation order is clear. As the project evolves, groom the backlog regularly to re-prioritize based on new insights.
Phase 4: Implementation Loop
Step 8: Implement Issue by Issue
Create a feature branch for this EPIC.
Select the next logical issue from the backlog (respect dependencies).
Analyze it and document your analysis as a comment on the issue.
Implement it using TDD (choose London or Chicago School as appropriate).
Each test references its Use Case ID for traceability. Commit when done.
Check if the spec or architecture docs need updating.
For each issue:
-
Analyze: The AI examines the issue, reviews related specs and architecture, checks for dependencies on other issues, and posts an analysis comment on the issue
-
Implement with TDD: The AI writes tests first, then implementation. Each test references the Use Case ID it verifies (e.g.
UC-01). The AI chooses the appropriate mechanism for the language — annotations in Java, comments in JavaScript, docstrings in Python. This creates traceability from tests back to specifications without additional tooling. -
Commit: After the issue is implemented and all tests pass, commit with a reference to the issue number
-
Check docs: Ask whether the specification or architecture documentation needs updating based on what was learned during implementation
TDD (Test-Driven Development) comes in two schools:
-
⚓ TDD, London School (mockist): isolate the unit under test, mock dependencies. Good for interaction-heavy code.
-
⚓ TDD, Chicago School (classicist): test behavior through the public API, use real collaborators. Good for state-based logic.
The AI selects the appropriate school based on the code’s characteristics.
Feedback Loop to Specification
During implementation, you will discover gaps in the specification or architecture. This is normal and expected. Periodically ask the AI:
Based on what we learned during implementation, does the specification
or architecture documentation need updating?
Update the docs before continuing. This keeps the specification a living document rather than a stale artifact. The spec is not just a means to generate code — it remains the authoritative description of the system’s behavior, maintained alongside the code for as long as the project lives.
Phase 5: Quality Assurance
Step 9: Code Review
Ask the AI to review the codebase:
Conduct a code review following the Fagan Inspection method.
Focus on correctness, maintainability, and adherence to the specification.
Document findings as a review report in src/docs/reviews/.
Fagan Inspection is a structured, systematic review process with defined roles and phases. It catches defects that testing alone cannot find.
|
Consider using a different AI model or a fresh AI session for reviews. An AI reviewing its own code has the same blind spots as a developer reviewing their own work. |
Step 10: Security Review
Conduct a security review based on the OWASP Top 10.
Document findings and recommendations in src/docs/reviews/.
Create GitHub issues for any vulnerabilities found.
⚓ OWASP Top 10 covers the most critical web application security risks. Even for CLI tools or libraries, the methodology identifies common vulnerability patterns.
Step 11: Exploratory Testing
Once the code reaches V1 status, conduct thorough exploratory testing:
Execute the tool and test it thoroughly. Focus on edge cases, unexpected
inputs, and error handling. The goal is to find bugs, not to confirm
that it works. Create a GitHub issue immediately for each bug you find.
This differs from the TDD tests written during implementation. Those tests verify specified behavior. Exploratory testing deliberately searches for unspecified behavior and edge cases. The AI executes the tool directly and interacts with it as a real user would.
Adapting This Workflow
This workflow is optimized for greenfield projects with a single developer and AI assistance. Some considerations for other contexts:
- Team projects
-
Use Pull Requests instead of direct merges. ADR discussions in GitHub issues become more valuable with multiple reviewers. Consider assigning different AI models to different review roles.
- Larger projects
-
Break the work into multiple feature branches per EPIC. Run ATAM reviews after each major architectural change, not just once. Groom the backlog more frequently as the project grows.
- Legacy codebases
-
This workflow assumes you start from scratch. See Adapting the Workflow to Brownfield Projects for a dedicated guide covering reverse engineering, baseline test coverage, and incremental migration using bounded contexts.
- Frontend / HTML applications
-
This workflow works well for backend code, CLIs, and libraries where the compiler and TDD provide strong error correction. Frontend applications can be built with the same approach, but consistent UI is harder to verify automatically. The error correction layers that make backend development reliable (compiler errors, test failures) have no direct equivalent for visual consistency.
To compensate, start with a design system or component library and test individual components before full-page integration. Add visual regression testing with Playwright screenshot comparisons to your CI pipeline. This adds significant effort compared to backend projects, but without it the AI tends to produce functional but visually inconsistent results.
Prompt Cheat Sheet
The essential one-liners for each phase, with the Semantic Anchors they activate:
| Phase | Prompt | Anchors |
|---|---|---|
Requirements |
|
|
PRD |
|
|
Specification |
|
|
Architecture |
|
|
ADR |
|
|
ATAM |
|
|
Backlog |
|
INVEST, MoSCoW |
Implementation |
|
TDD London / Chicago, SOLID, DRY, DDD |
Code Review |
|
Fagan Inspection |
Security |
|
|
Testing |
|
— |
Cross-cutting anchors that apply to all phases: Docs-as-Code (Ralf D. Müller), Conventional Commits, Plain English / Strunk & White.
A Note on Tooling
This document describes a plain workflow using standard prompts. You will notice that none of the prompts use role-playing patterns like "You are an expert tester" or "Act as a senior architect." That is intentional. Semantic Anchors activate the right knowledge domain without pretending the model is something it is not. "Conduct a Fagan Inspection" is more precise than "You are a world-class code reviewer."
In practice, the workflow benefits greatly from specialized tooling. MCP servers like Responsible Vibe MCP enforce agentic workflows with phase gates and review checkpoints. Serena provides semantic code navigation so the agent can work with symbols and references instead of raw text. Skill systems like Superpowers encode reusable process patterns like TDD, debugging, and code review workflows. Playwright MCP lets the agent control a browser for end-to-end testing and exploratory testing of web applications. These tools automate the discipline this document describes manually.
The goal of this document is transparency. Every step is visible, every prompt is reproducible, and nothing depends on proprietary tooling. Once you understand the process, adopt whatever tools make it easier to follow.
Conclusion
This workflow demonstrates how Semantic Anchors make AI-assisted development systematic and repeatable. Each prompt stays short because the anchor activates the full concept. The error correction layers (specification, architecture review, TDD, code review, security review, exploratory testing) stack up to make the noisy LLM channel reliable.
You do not need to be an expert in every methodology listed here. The AI knows these concepts. Your job is to know which anchor to use when, and to verify the results.
Try it on your next project. Start with the Socratic Method for requirements and see where it leads. If you discover anchors that work well in your workflow, contribute them to the collection.
Once you are comfortable with this greenfield workflow, you can adapt it to existing codebases. See Adapting the Workflow to Brownfield Projects for a step-by-step guide.
Further Reading
-
Birgitta Böckeler, Exploring Gen AI (martinfowler.com) — a critical analysis of spec-driven development tools (Kiro, spec-kit, Tessl) and their trade-offs. Examines where elaborate upfront specifications help and where they create overhead.
-
Simon Martinelli, AI Unified Process — a requirements-driven methodology combining Rational Unified Process principles with AI tooling. Treats AI as a consistency engine that regenerates code from evolving specifications, with four phases: Inception, Elaboration, Construction, Transition.
-
Ralf D. Müller & Simon Martinelli, Spec-Driven Development (software-architektur.tv, Episode 298) — podcast discussion on how specifications and requirements take center stage in AI-assisted development, and why iterative refinement beats perfect upfront specs.