Vibe Architecture — ATAM Evaluation

Method and Scope

This is an Architecture Tradeoff Analysis Method (ATAM) evaluation of the architecture documented in src/docs/arc42/. ATAM is scenario-based: it maps quality-attribute scenarios onto architectural decisions and finds the sensitivity points, tradeoff points and risks those decisions create. It produces no score — it produces a map of where the architecture is fragile and why.

The evaluation follows the ATAM outputs: business drivers, architectural approaches, a quality-attribute utility tree, scenario analysis, and a consolidated list of sensitivity points (S-n), tradeoff points (T-n), risks (R-n) and non-risks (NR-n).

Important

The requested priority list does not exist

The task asked for "the Q4.9 quality-goal ranking" to be used as the priority list. Q4.9 is an [OPEN] leaf of QUESTION_TREE.adoc and is unanswered in OPEN_QUESTIONS.adoc — its answer block still reads (write here). There is no team-supplied ranking.

This evaluation therefore proceeds on a provisional ranking (Provisional Quality-Goal Priority) derived, as the arc42 documents themselves do, from code-visible emphasis. Every prioritisation in this report — the utility tree weights, the scenario ordering, and which tradeoff is judged "acceptable" — inherits this deferral. This is recorded as the master risk [R-0]; individual scenarios that depend on it carry a ⚑ Deferred flag.

Provisional Quality-Goal Priority

Q4.9 itself notes: "the README theme 'Safety first' and the permission system suggest Security/Usability are prioritised, but the code cannot rank the eight ISO 25010 characteristics." The arc42 Chapter 1.2 names five top goals; Chapter 10 marks three as derived. The provisional order below is used only so the ATAM can proceed — it is not authoritative.

Rank	Quality Goal	Basis (provisional)
1	Security / Safety	"Safety first" README theme; permission system is the most elaborated subsystem (arc42 §8.2).
2	Usability	Q4.9 pairs Usability with Security; TUI, onboarding, autocompletion.
3	Reliability	Retry, auto-compaction, clean cancellation (arc42 §4, §6).
4	Maintainability	Strict CI gates, hexagonal seams (arc42 §2, ADR-001).
5	Compatibility	Multi-provider, MCP, ACP (arc42 §4, ADR-002).
6–8	Performance Efficiency, Functional Suitability, Portability	Chapter 10 "derived" characteristics.

Rank

Quality Goal

Basis (provisional)

Security / Safety

"Safety first" README theme; permission system is the most elaborated subsystem (arc42 §8.2).

Usability

Q4.9 pairs Usability with Security; TUI, onboarding, autocompletion.

Reliability

Retry, auto-compaction, clean cancellation (arc42 §4, §6).

Maintainability

Strict CI gates, hexagonal seams (arc42 §2, ADR-001).

Compatibility

Multi-provider, MCP, ACP (arc42 §4, ADR-002).

6–8

Performance Efficiency, Functional Suitability, Portability

Chapter 10 "derived" characteristics.

Business Drivers

The system is a client-side CLI coding agent that executes shell commands and edits files on a developer’s machine on behalf of an LLM (arc42 §1, §3). The dominant business tension is therefore intrinsic: the agent is useful in proportion to how much it may do without asking, and dangerous in the same proportion. The two named tradeoffs — autonomy vs. safety and speed vs. correctness — are both expressions of that tension.

Which user segment the architecture should favour (interactive, programmatic, or ACP) is [OPEN] — Q1.2.2 / Q1.6 — so the relative weight of the interactive and the unattended (CI) scenarios below cannot be settled. This is risk [R-0].

Architectural Approaches Examined

ID	Approach	Source
AP-1	Hexagonal ports and adapters	ADR-001
AP-2	Pluggable LLM backend factory + API-style adapters	ADR-002
AP-3	Conversation middleware pipeline (turn/price/compaction)	ADR-003
AP-4	File-based session persistence (folder + JSONL)	ADR-004
AP-5	Tiered tool permissions + agent safety profiles	arc42 §8.2, BR-3/BR-4
AP-6	Trust-folder gate	arc42 §8.2, BR-1
AP-7	Working-directory boundary for file tools	arc42 §8.2, T-003
AP-8	Retry/backoff + auto-compaction recovery	arc42 §8.5, §6
AP-9	Concurrent (asyncio) tool execution within a turn	arc42 §6

Approach

Source

AP-1

Hexagonal ports and adapters

ADR-001

AP-2

Pluggable LLM backend factory + API-style adapters

ADR-002

AP-3

Conversation middleware pipeline (turn/price/compaction)

ADR-003

AP-4

File-based session persistence (folder + JSONL)

ADR-004

AP-5

Tiered tool permissions + agent safety profiles

arc42 §8.2, BR-3/BR-4

AP-6

Trust-folder gate

arc42 §8.2, BR-1

AP-7

Working-directory boundary for file tools

arc42 §8.2, T-003

AP-8

Retry/backoff + auto-compaction recovery

arc42 §8.5, §6

AP-9

Concurrent (asyncio) tool execution within a turn

arc42 §6

Quality-Attribute Utility Tree

Each leaf is a scenario, tagged (importance, architectural risk) on a High/Medium/Low scale. Both axes are provisional per [R-0].

Security / Safety
- Tool-execution control — SC-1 — Destructive command in an unattended CI run ⚑ Deferred (H, H), SC-2 — Calibrated autonomy in interactive accept-edits (H, M)
- Boundary enforcement — SC-3 — First entry into an untrusted cloned repository (H, M), SC-4 — File write aimed outside the working directory ⚑ Deferred (H, M)
Usability
- Low-friction control — SC-2 — Calibrated autonomy in interactive accept-edits (M, M), SC-3 — First entry into an untrusted cloned repository (M, L)
Reliability
- Recovery from overflow / outage — SC-6 — Context auto-compaction on a long session ⚑ Deferred (H, M), SC-9 — Transient provider rate-limit, then a sustained outage ⚑ Deferred (M, M)
Performance Efficiency
- Turn throughput — SC-5 — Concurrent edits to the same file in one turn (M, H), SC-8 — Byte-capped read of a large file (M, M)
- Cost bounding — SC-7 — --max-price ceiling overshoot (M, M)
Maintainability / Compatibility
- Extensibility — SC-10 — A contributor adds a new LLM provider (M, L), SC-11 — Editor integration over ACP (M, M)

Scenario Analysis

Each scenario gives stimulus, environment, response and measure, then the architectural decision it exercises and the S/T/R points it reveals. ⚑ Deferred marks a scenario whose evaluation rests on an unanswered Question-Tree leaf.

Autonomy vs. Safety

SC-1 — Destructive command in an unattended CI run ⚑ Deferred

Stimulus: The LLM emits a destructive shell command (e.g. a recursive delete) during vibe -p in a CI pipeline.
Environment: Programmatic mode, which forces the auto-approve agent (BR-5).
Response: There is no human approval prompt. Only the bash allow/deny prefix lists and arity checks stand between the LLM and execution (arc42 §8.2); sudo always asks, but a non-sudo destructive command is not on any deny list by default.
Response measure: Undefined — no arc42 quality scenario measures the fraction of destructive commands the allow/deny list blocks.
Decision: AP-5 + BR-5.
Analysis: This is the sharpest autonomy-vs-safety tradeoff in the system. Programmatic mode trades the entire human safety gate for unattended throughput. The arc42 Security concept (§8.2) lists "Tiered tool permissions" as the mitigation for threat T-001, but in auto-approve the ASK tier collapses to ALWAYS — the mitigation is disabled by the very mode that most needs it.
Flag: Rests on Q3.8.1 (no STRIDE threat model — the residual risk of T-001 cannot be claimed bounded) and Q4.9 (whether Safety outranks the unattended-throughput use case is unranked). → [T-1], [R-1].

SC-2 — Calibrated autonomy in interactive `accept-edits`

Stimulus: An interactive developer selects the accept-edits profile; the LLM proposes a file edit and, separately, a shell command.
Environment: Interactive TUI.
Response: The edit auto-applies without a prompt; the mutating shell command still resolves to ASK and prompts (arc42 §8.2, BR-4).
Response measure: Edits applied with zero prompts; mutating bash prompts 100 % of the time.
Decision: AP-5 (agent safety profiles).
Analysis: This is the architecture’s designed middle of the autonomy-safety spectrum, and it is sound: the per-tool permission tier is orthogonal to the per-profile autonomy level, so the profile can relax edits without relaxing shell execution. → [NR-1], [S-1], [S-2].

SC-3 — First entry into an untrusted cloned repository

Stimulus: A developer cd`s into a freshly cloned repository whose `.vibe/ ships hostile tool, hook and agent definitions.
Environment: Interactive TUI, first run in that folder.
Response: The trust-folder gate blocks loading of project config until the developer explicitly accepts (arc42 §8.2, BR-1); the decision persists in ~/.vibe/trusted_folders.toml.
Response measure: Untrusted project config loaded: never. One trust prompt per new folder.
Decision: AP-6.
Analysis: The gate cleanly closes threat T-005. The cost is one usability prompt per new folder — a small, bounded friction.
Flag: Mildly rests on Q2.6.BR.intent — whether the trust gate is a hard security guarantee or a convenience is deferred, so its strength cannot be asserted. → [NR-2], [S-6].

SC-4 — File write aimed outside the working directory ⚑ Deferred

Stimulus: The LLM issues a write_file / search_replace targeting a path outside the working directory.
Environment: Any session.
Response: The working-directory boundary forces an extra approval (arc42 §8.2, T-003); --add-dir widens the permitted root set.
Response measure: Out-of-boundary writes without approval: none — provided the boundary is a guarantee.
Decision: AP-7.
Analysis: The boundary is an approval trigger, not a hard block — and --add-dir plus the "extra approval" escape hatch mean an autonomous profile can still write widely.
Flag: Rests on Q2.6.BR.intent ("is the workdir boundary a security guarantee or a convenience?", explicitly deferred) and Q3.8.1. Until answered, SC-4’s response cannot be classified as safe or unsafe. → [T-1], [R-2].

Speed vs. Correctness

SC-5 — Concurrent edits to the same file in one turn

Stimulus: In a single turn the LLM emits two search_replace calls that touch the same file.
Environment: Any session.
Response: The Agent Loop runs tool calls as concurrent asyncio tasks (arc42 §6, Conversation Turn). The arc42 runtime view documents the concurrency but documents no serialization of file writes; the outcome is interleaving-dependent.
Response measure: None — arc42 §6 covers only the happy path; there is no scenario for concurrent-write conflict.
Decision: AP-9.
Analysis: Concurrency buys turn speed (the §10 quality tree credits "concurrent tools" to Performance Efficiency) at a correctness cost the documentation does not bound. Two patches to one file can lose an edit or apply against stale content. → [T-3], [R-3].

SC-6 — Context auto-compaction on a long session ⚑ Deferred

Stimulus: An interactive session’s context_tokens crosses the model’s auto_compact_threshold.
Environment: Long-running interactive session.
Response: AutoCompactMiddleware returns COMPACT; the loop summarises history into [system, summary] and forks a new session id (arc42 §6).
Response measure: Continuity preserved (the session does not fail); fidelity of the summary is not measured anywhere in arc42 §10.
Decision: AP-3 + AP-8.
Analysis: Compaction trades correctness (detail in the dropped history) for reliability/continuity (the session survives). The §10 scenario gives the trigger as a symbolic auto_compact_threshold with a 50 % warning, but no measure of how much task-relevant context survives.
Flag: The acceptability of the fidelity loss depends on the Reliability -vs- Functional-Suitability ranking, which is Q4.9 (deferred). → [T-2], [S-3].

SC-7 — `--max-price` ceiling overshoot

Stimulus: A programmatic run is bounded with --max-price; the session accumulates cost.
Environment: Programmatic mode.
Response: PriceLimitMiddleware stops the loop when session_cost exceeds the ceiling — but session_cost is a self-described rough estimate that ignores prompt caching (arc42 §11, TD-001).
Response measure: The --max-price guarantee is approximate; arc42 §11 states this explicitly. The divergence size is unmeasured.
Decision: AP-3.
Analysis: Speed of unattended operation (no human watching the bill) is bought against the correctness of the cost bound. The run can stop early (wasting budget) or late (overspending). → [R-4], [S-4].

SC-8 — Byte-capped read of a large file

Stimulus: The LLM reads a file larger than the read_file cap.
Environment: Any session.
Response: Content is returned truncated at 64 000 bytes with was_truncated set (arc42 §10).
Response measure: Read cap 64 000 bytes/call; grep 100 matches / 64 000 bytes; bash 16 000-byte output cap.
Decision: Bounded-I/O caps (arc42 §10).
Analysis: The caps bound turn latency and context growth (speed) but hand the LLM partial input (correctness) — and was_truncated is the only signal that this happened. A reasoning error built on a truncated read is silent. → [T-5], [S-5].

Reliability, Maintainability, Compatibility

SC-9 — Transient provider rate-limit, then a sustained outage ⚑ Deferred

Stimulus: The LLM provider returns 429s, then becomes unavailable for an extended period.
Environment: Any session.
Response: The Mistral backend retries with exponential backoff — 500 ms initial, 1.5× exponent, 300 s cap (arc42 §8.5, §10). Behaviour after retry exhaustion / on a sustained outage is not specified (arc42 §11, R-009).
Response measure: A single turn can block up to the 300 s cap; the sustained-outage end-state has no measure.
Decision: AP-8.
Analysis: The retry cap is a reliability-vs-performance tradeoff — a hung turn costs up to 300 s of latency.
Flag: The sustained-outage end-state rests on Q5.3 (deferred to Operations). → [T-4], [R-5].

SC-10 — A contributor adds a new LLM provider

Stimulus: A contributor integrates a new provider.
Environment: Development / CI.
Response: A new backend is registered in BACKEND_FACTORY or covered by a new API-style adapter behind the BackendLike port; the Agent Loop is unchanged (ADR-002).
Response measure: Agent-loop files changed: zero.
Decision: AP-1 + AP-2.
Analysis: Hexagonal ports plus the backend factory deliver Compatibility and Maintainability with no opposing attribute materially harmed — ADR-002’s only negative is config complexity (-1). A clean non-risk. → [NR-3].

SC-11 — Editor integration over ACP

Stimulus: An IDE integrator embeds the agent via vibe-acp.
Environment: ACP server over stdio.
Response: The ACP bridge exposes the engine; ACP-specific tools delegate file/terminal operations to the ACP client (arc42 §5, §6).
Response measure: One engine reused behind two front ends.
Decision: AP-1 (ports) + the ACP bridge.
Analysis: Compatibility is delivered, but acp/tools/ re-implements several core/tools/ builtins (arc42 §11, TD-002) — a Compatibility-vs -Maintainability tradeoff: the protocol reach costs duplicated tool maintenance. → [T-6], [R-6].

Consolidated Findings

Sensitivity Points

ID Sensitivity Point Attributes sensitive to it

ID	Sensitivity Point	Attributes sensitive to it
S-1	The permission tier (`ALWAYS`/`ASK`/`NEVER`) assigned to each tool	Security, Usability
S-2	The active agent safety profile	Security, Performance (autonomy)
S-3	The `auto_compact_threshold` value	Reliability, Functional Suitability (correctness)
S-4	The retry parameters (500 ms / 1.5× / 300 s cap)	Reliability, Performance (latency)
S-5	The tool I/O byte caps (16 k / 64 k / 100 k)	Performance, Functional Suitability
S-6	The trust-folder gate	Security, Usability (first-run friction)

S-1

The permission tier (ALWAYS/ASK/NEVER) assigned to each tool

Security, Usability

S-2

The active agent safety profile

Security, Performance (autonomy)

S-3

The auto_compact_threshold value

Reliability, Functional Suitability (correctness)

S-4

The retry parameters (500 ms / 1.5× / 300 s cap)

Reliability, Performance (latency)

S-5

The tool I/O byte caps (16 k / 64 k / 100 k)

Performance, Functional Suitability

S-6

The trust-folder gate

Security, Usability (first-run friction)

Tradeoff Points

ID Tradeoff Point

ID	Tradeoff Point
T-1	Agent profile + permission tiers (AP-5). Autonomy/throughput (Performance, Usability) vs. Safety (Security). Sharpest at the `auto-approve` profile — see SC-1 — Destructive command in an unattended CI run ⚑ Deferred, SC-4 — File write aimed outside the working directory ⚑ Deferred.
T-2	Auto-compaction (AP-3/AP-8). Continuity/Reliability vs. context fidelity/Correctness — see SC-6 — Context auto-compaction on a long session ⚑ Deferred.
T-3	Concurrent tool execution (AP-9). Turn speed/Performance vs. write-correctness — see SC-5 — Concurrent edits to the same file in one turn.
T-4	Retry/backoff cap (AP-8). Reliability vs. Performance (a turn may block up to 300 s) — see SC-9 — Transient provider rate-limit, then a sustained outage ⚑ Deferred.
T-5	Bounded-I/O byte caps. Performance vs. Functional Suitability (truncated input) — see SC-8 — Byte-capped read of a large file.
T-6	ACP tool re-implementation (TD-002). Compatibility/protocol reach vs. Maintainability — see SC-11 — Editor integration over ACP.

T-1

Agent profile + permission tiers (AP-5). Autonomy/throughput (Performance, Usability) vs. Safety (Security). Sharpest at the auto-approve profile — see SC-1 — Destructive command in an unattended CI run ⚑ Deferred, SC-4 — File write aimed outside the working directory ⚑ Deferred.

T-2

Auto-compaction (AP-3/AP-8). Continuity/Reliability vs. context fidelity/Correctness — see SC-6 — Context auto-compaction on a long session ⚑ Deferred.

T-3

Concurrent tool execution (AP-9). Turn speed/Performance vs. write-correctness — see SC-5 — Concurrent edits to the same file in one turn.

T-4

Retry/backoff cap (AP-8). Reliability vs. Performance (a turn may block up to 300 s) — see SC-9 — Transient provider rate-limit, then a sustained outage ⚑ Deferred.

T-5

Bounded-I/O byte caps. Performance vs. Functional Suitability (truncated input) — see SC-8 — Byte-capped read of a large file.

T-6

ACP tool re-implementation (TD-002). Compatibility/protocol reach vs. Maintainability — see SC-11 — Editor integration over ACP.

Risks

R-0 (master risk) — The priority list is unconfirmed. Q4.9 is [OPEN] and unanswered. Every weight, ordering and "acceptable tradeoff" judgement in this report is provisional. The autonomy-vs-safety tradeoff in particular cannot be resolved without knowing whether Safety outranks unattended throughput — and the segment priority that would inform that (Q1.2.2 / Q1.6) is also [OPEN]. ⚑ Deferred.

R-1 — auto-approve disables the headline safety mitigation. In programmatic mode (BR-5) the ASK tier collapses to ALWAYS, so "Tiered tool permissions" — the §8.2 mitigation for T-001 — is inert in the mode that runs unattended. With no STRIDE model (Q3.8.1) the residual risk is unquantified. ⚑ Deferred (Q3.8.1, Q4.9). See SC-1 — Destructive command in an unattended CI run ⚑ Deferred.

R-2 — The working-directory boundary’s strength is undefined. AP-7 is an approval trigger, not a hard block, and --add-dir widens it. Whether it is a security guarantee is explicitly deferred (Q2.6.BR.intent). ⚑ Deferred. See SC-4 — File write aimed outside the working directory ⚑ Deferred.

R-3 — Concurrent same-file edits have no documented serialization. arc42 §6 documents concurrency but no write-conflict handling; the outcome of two patches to one file is interleaving-dependent. See SC-5 — Concurrent edits to the same file in one turn.

R-4 — The --max-price ceiling is approximate. session_cost ignores prompt caching (TD-001), so a cost-bounded autonomous run can overshoot or stop early. See SC-7 — --max-price ceiling overshoot.

R-5 — Sustained-outage behaviour is unspecified. Beyond the 300 s retry cap the end-state is undefined (R-009 / Q5.3). ⚑ Deferred. See SC-9 — Transient provider rate-limit, then a sustained outage ⚑ Deferred.

R-6 — ACP duplicates core tools. TD-002: acp/tools/ re-implements core/tools/ builtins; a tool fix must be made twice. See SC-11 — Editor integration over ACP.

Non-Risks

NR-1 — Orthogonal profile / permission design. The per-tool tier and the per-profile autonomy level are independent, so accept-edits can relax edits without relaxing shell execution (SC-2 — Calibrated autonomy in interactive accept-edits).

NR-2 — The trust-folder gate. A clean, bounded mitigation for T-005 at a one-prompt usability cost (SC-3 — First entry into an untrusted cloned repository) — subject only to the mild Q2.6.BR.intent caveat.

NR-3 — Hexagonal ports + backend factory. Compatibility and Maintainability with no opposing attribute materially harmed (SC-10 — A contributor adds a new LLM provider).

Scenarios Resting on a Deferred Question

Scenario	Deferred leaf	Effect on the evaluation
SC-1 — Destructive command in an unattended CI run ⚑ Deferred	Q3.8.1, Q4.9	Residual T-001 risk cannot be bounded; whether the autonomy gain justifies it cannot be ranked.
SC-4 — File write aimed outside the working directory ⚑ Deferred	Q2.6.BR.intent, Q3.8.1	The boundary cannot be classified as a guarantee or a convenience.
SC-6 — Context auto-compaction on a long session ⚑ Deferred	Q4.9	The acceptability of compaction’s fidelity loss depends on the Reliability-vs-Functional-Suitability rank.
SC-9 — Transient provider rate-limit, then a sustained outage ⚑ Deferred	Q5.3	The sustained-outage end-state is undefined.
All scenarios	Q4.9, Q1.2.2, Q1.6 (via [R-0])	The priority order, utility-tree weights and every "acceptable" judgement are provisional.

Scenario

Deferred leaf

Effect on the evaluation

SC-1 — Destructive command in an unattended CI run ⚑ Deferred

Q3.8.1, Q4.9

Residual T-001 risk cannot be bounded; whether the autonomy gain justifies it cannot be ranked.

SC-4 — File write aimed outside the working directory ⚑ Deferred

Q2.6.BR.intent, Q3.8.1

The boundary cannot be classified as a guarantee or a convenience.

SC-6 — Context auto-compaction on a long session ⚑ Deferred

Q4.9

The acceptability of compaction’s fidelity loss depends on the Reliability-vs-Functional-Suitability rank.

SC-9 — Transient provider rate-limit, then a sustained outage ⚑ Deferred

Q5.3

The sustained-outage end-state is undefined.

All scenarios

Q4.9, Q1.2.2, Q1.6 (via [R-0])

The priority order, utility-tree weights and every "acceptable" judgement are provisional.

Conclusion

The architecture’s central decision — the tiered permission system plus agent safety profiles (AP-5) — is a well-formed tradeoff mechanism: SC-2 — Calibrated autonomy in interactive accept-edits shows the orthogonal tier/profile design lets autonomy be tuned without dismantling safety. The serious finding is not the mechanism but its unattended configuration: programmatic mode forces auto-approve (BR-5), which inverts the mechanism by collapsing every ASK to ALWAYS (SC-1 — Destructive command in an unattended CI run ⚑ Deferred, [R-1]).

That finding cannot be closed by architecture work alone. It rests on two unanswered questions — whether Safety is the top quality goal (Q4.9) and whether there is a threat model to bound the residual risk (Q3.8.1) — and on a third, the segment priority (Q1.2.2/Q1.6), that decides how much the unattended use case should weigh at all.

The single highest-value action is not a code change: it is answering Q4.9, Q3.8.1 and Q2.6.BR.intent. Until then this ATAM, like the architecture documentation it evaluates, can only describe the tradeoffs — it cannot adjudicate them.

Findings are recorded for disposition; no fixes were applied.

Feedback

Was this page helpful?

Glad to hear it! Please tell us how we can improve.

Sorry to hear that. Please tell us how we can improve.