Vibe Architecture — ATAM Evaluation
Method and Scope
This is an Architecture Tradeoff Analysis Method (ATAM) evaluation of the
architecture documented in src/docs/arc42/. ATAM is scenario-based: it
maps quality-attribute scenarios onto architectural decisions and finds
the sensitivity points, tradeoff points and risks those decisions
create. It produces no score — it produces a map of where the
architecture is fragile and why.
The evaluation follows the ATAM outputs: business drivers, architectural approaches, a quality-attribute utility tree, scenario analysis, and a consolidated list of sensitivity points (S-n), tradeoff points (T-n), risks (R-n) and non-risks (NR-n).
|
Important
|
The requested priority list does not exist
The task asked for "the Q4.9 quality-goal ranking" to be used as the
priority list. Q4.9 is an This evaluation therefore proceeds on a provisional ranking (Provisional Quality-Goal Priority) derived, as the arc42 documents themselves do, from code-visible emphasis. Every prioritisation in this report — the utility tree weights, the scenario ordering, and which tradeoff is judged "acceptable" — inherits this deferral. This is recorded as the master risk [R-0]; individual scenarios that depend on it carry a ⚑ Deferred flag. |
Provisional Quality-Goal Priority
Q4.9 itself notes: "the README theme 'Safety first' and the permission system suggest Security/Usability are prioritised, but the code cannot rank the eight ISO 25010 characteristics." The arc42 Chapter 1.2 names five top goals; Chapter 10 marks three as derived. The provisional order below is used only so the ATAM can proceed — it is not authoritative.
| Rank | Quality Goal | Basis (provisional) |
|---|---|---|
1 |
Security / Safety |
"Safety first" README theme; permission system is the most elaborated subsystem (arc42 §8.2). |
2 |
Usability |
Q4.9 pairs Usability with Security; TUI, onboarding, autocompletion. |
3 |
Reliability |
Retry, auto-compaction, clean cancellation (arc42 §4, §6). |
4 |
Maintainability |
Strict CI gates, hexagonal seams (arc42 §2, ADR-001). |
5 |
Compatibility |
Multi-provider, MCP, ACP (arc42 §4, ADR-002). |
6–8 |
Performance Efficiency, Functional Suitability, Portability |
Chapter 10 "derived" characteristics. |
Business Drivers
The system is a client-side CLI coding agent that executes shell commands and edits files on a developer’s machine on behalf of an LLM (arc42 §1, §3). The dominant business tension is therefore intrinsic: the agent is useful in proportion to how much it may do without asking, and dangerous in the same proportion. The two named tradeoffs — autonomy vs. safety and speed vs. correctness — are both expressions of that tension.
Which user segment the architecture should favour (interactive,
programmatic, or ACP) is [OPEN] — Q1.2.2 / Q1.6 — so the relative
weight of the interactive and the unattended (CI) scenarios below
cannot be settled. This is risk [R-0].
Architectural Approaches Examined
| ID | Approach | Source |
|---|---|---|
AP-1 |
Hexagonal ports and adapters |
ADR-001 |
AP-2 |
Pluggable LLM backend factory + API-style adapters |
ADR-002 |
AP-3 |
Conversation middleware pipeline (turn/price/compaction) |
ADR-003 |
AP-4 |
File-based session persistence (folder + JSONL) |
ADR-004 |
AP-5 |
Tiered tool permissions + agent safety profiles |
arc42 §8.2, BR-3/BR-4 |
AP-6 |
Trust-folder gate |
arc42 §8.2, BR-1 |
AP-7 |
Working-directory boundary for file tools |
arc42 §8.2, T-003 |
AP-8 |
Retry/backoff + auto-compaction recovery |
arc42 §8.5, §6 |
AP-9 |
Concurrent (asyncio) tool execution within a turn |
arc42 §6 |
Quality-Attribute Utility Tree
Each leaf is a scenario, tagged (importance, architectural risk) on a
High/Medium/Low scale. Both axes are provisional per [R-0].
-
Security / Safety
-
Tool-execution control — SC-1 — Destructive command in an unattended CI run ⚑ Deferred (H, H), SC-2 — Calibrated autonomy in interactive
accept-edits(H, M) -
Boundary enforcement — SC-3 — First entry into an untrusted cloned repository (H, M), SC-4 — File write aimed outside the working directory ⚑ Deferred (H, M)
-
-
Usability
-
Low-friction control — SC-2 — Calibrated autonomy in interactive
accept-edits(M, M), SC-3 — First entry into an untrusted cloned repository (M, L)
-
-
Reliability
-
Recovery from overflow / outage — SC-6 — Context auto-compaction on a long session ⚑ Deferred (H, M), SC-9 — Transient provider rate-limit, then a sustained outage ⚑ Deferred (M, M)
-
-
Performance Efficiency
-
Turn throughput — SC-5 — Concurrent edits to the same file in one turn (M, H), SC-8 — Byte-capped read of a large file (M, M)
-
Cost bounding — SC-7 —
--max-priceceiling overshoot (M, M)
-
-
Maintainability / Compatibility
-
Extensibility — SC-10 — A contributor adds a new LLM provider (M, L), SC-11 — Editor integration over ACP (M, M)
-
Scenario Analysis
Each scenario gives stimulus, environment, response and measure, then the architectural decision it exercises and the S/T/R points it reveals. ⚑ Deferred marks a scenario whose evaluation rests on an unanswered Question-Tree leaf.
Autonomy vs. Safety
SC-1 — Destructive command in an unattended CI run ⚑ Deferred
- Stimulus
-
The LLM emits a destructive shell command (e.g. a recursive delete) during
vibe -pin a CI pipeline. - Environment
-
Programmatic mode, which forces the
auto-approveagent (BR-5). - Response
-
There is no human approval prompt. Only the
bashallow/deny prefix lists and arity checks stand between the LLM and execution (arc42 §8.2);sudoalways asks, but a non-sudodestructive command is not on any deny list by default. - Response measure
-
Undefined — no arc42 quality scenario measures the fraction of destructive commands the allow/deny list blocks.
- Decision
-
AP-5 + BR-5.
- Analysis
-
This is the sharpest autonomy-vs-safety tradeoff in the system. Programmatic mode trades the entire human safety gate for unattended throughput. The arc42 Security concept (§8.2) lists "Tiered tool permissions" as the mitigation for threat T-001, but in
auto-approvetheASKtier collapses toALWAYS— the mitigation is disabled by the very mode that most needs it. - Flag
-
Rests on Q3.8.1 (no STRIDE threat model — the residual risk of T-001 cannot be claimed bounded) and Q4.9 (whether Safety outranks the unattended-throughput use case is unranked). → [T-1], [R-1].
SC-2 — Calibrated autonomy in interactive accept-edits
- Stimulus
-
An interactive developer selects the
accept-editsprofile; the LLM proposes a file edit and, separately, a shell command. - Environment
-
Interactive TUI.
- Response
-
The edit auto-applies without a prompt; the mutating shell command still resolves to
ASKand prompts (arc42 §8.2, BR-4). - Response measure
-
Edits applied with zero prompts; mutating
bashprompts 100 % of the time. - Decision
-
AP-5 (agent safety profiles).
- Analysis
-
This is the architecture’s designed middle of the autonomy-safety spectrum, and it is sound: the per-tool permission tier is orthogonal to the per-profile autonomy level, so the profile can relax edits without relaxing shell execution. → [NR-1], [S-1], [S-2].
SC-3 — First entry into an untrusted cloned repository
- Stimulus
-
A developer
cd`s into a freshly cloned repository whose `.vibe/ships hostile tool, hook and agent definitions. - Environment
-
Interactive TUI, first run in that folder.
- Response
-
The trust-folder gate blocks loading of project config until the developer explicitly accepts (arc42 §8.2, BR-1); the decision persists in
~/.vibe/trusted_folders.toml. - Response measure
-
Untrusted project config loaded: never. One trust prompt per new folder.
- Decision
-
AP-6.
- Analysis
-
The gate cleanly closes threat T-005. The cost is one usability prompt per new folder — a small, bounded friction.
- Flag
-
Mildly rests on Q2.6.BR.intent — whether the trust gate is a hard security guarantee or a convenience is deferred, so its strength cannot be asserted. → [NR-2], [S-6].
SC-4 — File write aimed outside the working directory ⚑ Deferred
- Stimulus
-
The LLM issues a
write_file/search_replacetargeting a path outside the working directory. - Environment
-
Any session.
- Response
-
The working-directory boundary forces an extra approval (arc42 §8.2, T-003);
--add-dirwidens the permitted root set. - Response measure
-
Out-of-boundary writes without approval: none — provided the boundary is a guarantee.
- Decision
-
AP-7.
- Analysis
-
The boundary is an approval trigger, not a hard block — and
--add-dirplus the "extra approval" escape hatch mean an autonomous profile can still write widely. - Flag
-
Rests on Q2.6.BR.intent ("is the workdir boundary a security guarantee or a convenience?", explicitly deferred) and Q3.8.1. Until answered, SC-4’s response cannot be classified as safe or unsafe. → [T-1], [R-2].
Speed vs. Correctness
SC-5 — Concurrent edits to the same file in one turn
- Stimulus
-
In a single turn the LLM emits two
search_replacecalls that touch the same file. - Environment
-
Any session.
- Response
-
The Agent Loop runs tool calls as concurrent
asynciotasks (arc42 §6, Conversation Turn). The arc42 runtime view documents the concurrency but documents no serialization of file writes; the outcome is interleaving-dependent. - Response measure
-
None — arc42 §6 covers only the happy path; there is no scenario for concurrent-write conflict.
- Decision
-
AP-9.
- Analysis
-
Concurrency buys turn speed (the §10 quality tree credits "concurrent tools" to Performance Efficiency) at a correctness cost the documentation does not bound. Two patches to one file can lose an edit or apply against stale content. → [T-3], [R-3].
SC-6 — Context auto-compaction on a long session ⚑ Deferred
- Stimulus
-
An interactive session’s
context_tokenscrosses the model’sauto_compact_threshold. - Environment
-
Long-running interactive session.
- Response
-
AutoCompactMiddlewarereturnsCOMPACT; the loop summarises history into[system, summary]and forks a new session id (arc42 §6). - Response measure
-
Continuity preserved (the session does not fail); fidelity of the summary is not measured anywhere in arc42 §10.
- Decision
-
AP-3 + AP-8.
- Analysis
-
Compaction trades correctness (detail in the dropped history) for reliability/continuity (the session survives). The §10 scenario gives the trigger as a symbolic
auto_compact_thresholdwith a 50 % warning, but no measure of how much task-relevant context survives. - Flag
-
The acceptability of the fidelity loss depends on the Reliability -vs- Functional-Suitability ranking, which is Q4.9 (deferred). → [T-2], [S-3].
SC-7 — --max-price ceiling overshoot
- Stimulus
-
A programmatic run is bounded with
--max-price; the session accumulates cost. - Environment
-
Programmatic mode.
- Response
-
PriceLimitMiddlewarestops the loop whensession_costexceeds the ceiling — butsession_costis a self-described rough estimate that ignores prompt caching (arc42 §11, TD-001). - Response measure
-
The
--max-priceguarantee is approximate; arc42 §11 states this explicitly. The divergence size is unmeasured. - Decision
-
AP-3.
- Analysis
-
Speed of unattended operation (no human watching the bill) is bought against the correctness of the cost bound. The run can stop early (wasting budget) or late (overspending). → [R-4], [S-4].
SC-8 — Byte-capped read of a large file
- Stimulus
-
The LLM reads a file larger than the
read_filecap. - Environment
-
Any session.
- Response
-
Content is returned truncated at 64 000 bytes with
was_truncatedset (arc42 §10). - Response measure
-
Read cap 64 000 bytes/call;
grep100 matches / 64 000 bytes;bash16 000-byte output cap. - Decision
-
Bounded-I/O caps (arc42 §10).
- Analysis
-
The caps bound turn latency and context growth (speed) but hand the LLM partial input (correctness) — and
was_truncatedis the only signal that this happened. A reasoning error built on a truncated read is silent. → [T-5], [S-5].
Reliability, Maintainability, Compatibility
SC-9 — Transient provider rate-limit, then a sustained outage ⚑ Deferred
- Stimulus
-
The LLM provider returns 429s, then becomes unavailable for an extended period.
- Environment
-
Any session.
- Response
-
The Mistral backend retries with exponential backoff — 500 ms initial, 1.5× exponent, 300 s cap (arc42 §8.5, §10). Behaviour after retry exhaustion / on a sustained outage is not specified (arc42 §11, R-009).
- Response measure
-
A single turn can block up to the 300 s cap; the sustained-outage end-state has no measure.
- Decision
-
AP-8.
- Analysis
-
The retry cap is a reliability-vs-performance tradeoff — a hung turn costs up to 300 s of latency.
- Flag
-
The sustained-outage end-state rests on Q5.3 (deferred to Operations). → [T-4], [R-5].
SC-10 — A contributor adds a new LLM provider
- Stimulus
-
A contributor integrates a new provider.
- Environment
-
Development / CI.
- Response
-
A new backend is registered in
BACKEND_FACTORYor covered by a new API-style adapter behind theBackendLikeport; the Agent Loop is unchanged (ADR-002). - Response measure
-
Agent-loop files changed: zero.
- Decision
-
AP-1 + AP-2.
- Analysis
-
Hexagonal ports plus the backend factory deliver Compatibility and Maintainability with no opposing attribute materially harmed — ADR-002’s only negative is config complexity (-1). A clean non-risk. → [NR-3].
SC-11 — Editor integration over ACP
- Stimulus
-
An IDE integrator embeds the agent via
vibe-acp. - Environment
-
ACP server over stdio.
- Response
-
The ACP bridge exposes the engine; ACP-specific tools delegate file/terminal operations to the ACP client (arc42 §5, §6).
- Response measure
-
One engine reused behind two front ends.
- Decision
-
AP-1 (ports) + the ACP bridge.
- Analysis
-
Compatibility is delivered, but
acp/tools/re-implements severalcore/tools/builtins (arc42 §11, TD-002) — a Compatibility-vs -Maintainability tradeoff: the protocol reach costs duplicated tool maintenance. → [T-6], [R-6].
Consolidated Findings
Sensitivity Points
| ID | Sensitivity Point | Attributes sensitive to it |
|---|---|---|
S-1 |
The permission tier ( |
Security, Usability |
S-2 |
The active agent safety profile |
Security, Performance (autonomy) |
S-3 |
The |
Reliability, Functional Suitability (correctness) |
S-4 |
The retry parameters (500 ms / 1.5× / 300 s cap) |
Reliability, Performance (latency) |
S-5 |
The tool I/O byte caps (16 k / 64 k / 100 k) |
Performance, Functional Suitability |
S-6 |
The trust-folder gate |
Security, Usability (first-run friction) |
Tradeoff Points
| ID | Tradeoff Point |
|---|---|
T-1 |
Agent profile + permission tiers (AP-5). Autonomy/throughput
(Performance, Usability) vs. Safety (Security). Sharpest at the
|
T-2 |
Auto-compaction (AP-3/AP-8). Continuity/Reliability vs. context fidelity/Correctness — see SC-6 — Context auto-compaction on a long session ⚑ Deferred. |
T-3 |
Concurrent tool execution (AP-9). Turn speed/Performance vs. write-correctness — see SC-5 — Concurrent edits to the same file in one turn. |
T-4 |
Retry/backoff cap (AP-8). Reliability vs. Performance (a turn may block up to 300 s) — see SC-9 — Transient provider rate-limit, then a sustained outage ⚑ Deferred. |
T-5 |
Bounded-I/O byte caps. Performance vs. Functional Suitability (truncated input) — see SC-8 — Byte-capped read of a large file. |
T-6 |
ACP tool re-implementation (TD-002). Compatibility/protocol reach vs. Maintainability — see SC-11 — Editor integration over ACP. |
Risks
R-0 (master risk) — The priority list is unconfirmed. Q4.9 is [OPEN]
and unanswered. Every weight, ordering and "acceptable tradeoff"
judgement in this report is provisional. The autonomy-vs-safety
tradeoff in particular cannot be resolved without knowing whether Safety
outranks unattended throughput — and the segment priority that would
inform that (Q1.2.2 / Q1.6) is also [OPEN]. ⚑ Deferred.
R-1 — auto-approve disables the headline safety mitigation. In
programmatic mode (BR-5) the ASK tier collapses to ALWAYS, so
"Tiered tool permissions" — the §8.2 mitigation for T-001 — is inert in
the mode that runs unattended. With no STRIDE model (Q3.8.1) the residual
risk is unquantified. ⚑ Deferred (Q3.8.1, Q4.9). See SC-1 — Destructive command in an unattended CI run ⚑ Deferred.
R-2 — The working-directory boundary’s strength is undefined. AP-7 is
an approval trigger, not a hard block, and --add-dir widens it.
Whether it is a security guarantee is explicitly deferred
(Q2.6.BR.intent). ⚑ Deferred. See SC-4 — File write aimed outside the working directory ⚑ Deferred.
R-3 — Concurrent same-file edits have no documented serialization. arc42 §6 documents concurrency but no write-conflict handling; the outcome of two patches to one file is interleaving-dependent. See SC-5 — Concurrent edits to the same file in one turn.
R-4 — The --max-price ceiling is approximate. session_cost ignores
prompt caching (TD-001), so a cost-bounded autonomous run can overshoot
or stop early. See SC-7 — --max-price ceiling overshoot.
R-5 — Sustained-outage behaviour is unspecified. Beyond the 300 s retry cap the end-state is undefined (R-009 / Q5.3). ⚑ Deferred. See SC-9 — Transient provider rate-limit, then a sustained outage ⚑ Deferred.
R-6 — ACP duplicates core tools. TD-002: acp/tools/ re-implements
core/tools/ builtins; a tool fix must be made twice. See SC-11 — Editor integration over ACP.
Non-Risks
NR-1 — Orthogonal profile / permission design. The per-tool tier and
the per-profile autonomy level are independent, so accept-edits can
relax edits without relaxing shell execution (SC-2 — Calibrated autonomy in interactive accept-edits).
NR-2 — The trust-folder gate. A clean, bounded mitigation for T-005 at a one-prompt usability cost (SC-3 — First entry into an untrusted cloned repository) — subject only to the mild Q2.6.BR.intent caveat.
NR-3 — Hexagonal ports + backend factory. Compatibility and Maintainability with no opposing attribute materially harmed (SC-10 — A contributor adds a new LLM provider).
Scenarios Resting on a Deferred Question
| Scenario | Deferred leaf | Effect on the evaluation |
|---|---|---|
SC-1 — Destructive command in an unattended CI run ⚑ Deferred |
Q3.8.1, Q4.9 |
Residual T-001 risk cannot be bounded; whether the autonomy gain justifies it cannot be ranked. |
SC-4 — File write aimed outside the working directory ⚑ Deferred |
Q2.6.BR.intent, Q3.8.1 |
The boundary cannot be classified as a guarantee or a convenience. |
Q4.9 |
The acceptability of compaction’s fidelity loss depends on the Reliability-vs-Functional-Suitability rank. |
|
SC-9 — Transient provider rate-limit, then a sustained outage ⚑ Deferred |
Q5.3 |
The sustained-outage end-state is undefined. |
All scenarios |
Q4.9, Q1.2.2, Q1.6 (via [R-0]) |
The priority order, utility-tree weights and every "acceptable" judgement are provisional. |
Conclusion
The architecture’s central decision — the tiered permission system plus
agent safety profiles (AP-5) — is a well-formed tradeoff mechanism:
SC-2 — Calibrated autonomy in interactive accept-edits shows the orthogonal tier/profile design lets autonomy be tuned
without dismantling safety. The serious finding is not the mechanism but
its unattended configuration: programmatic mode forces auto-approve
(BR-5), which inverts the mechanism by collapsing every ASK to
ALWAYS (SC-1 — Destructive command in an unattended CI run ⚑ Deferred, [R-1]).
That finding cannot be closed by architecture work alone. It rests on two unanswered questions — whether Safety is the top quality goal (Q4.9) and whether there is a threat model to bound the residual risk (Q3.8.1) — and on a third, the segment priority (Q1.2.2/Q1.6), that decides how much the unattended use case should weigh at all.
The single highest-value action is not a code change: it is answering Q4.9, Q3.8.1 and Q2.6.BR.intent. Until then this ATAM, like the architecture documentation it evaluates, can only describe the tradeoffs — it cannot adjudicate them.
Findings are recorded for disposition; no fixes were applied.
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.