Vibe Architecture — ATAM Evaluation

Method and Scope

This is an Architecture Tradeoff Analysis Method (ATAM) evaluation of the architecture documented in src/docs/arc42/. ATAM is scenario-based: it maps quality-attribute scenarios onto architectural decisions and finds the sensitivity points, tradeoff points and risks those decisions create. It produces no score — it produces a map of where the architecture is fragile and why.

The evaluation follows the ATAM outputs: business drivers, architectural approaches, a quality-attribute utility tree, scenario analysis, and a consolidated list of sensitivity points (S-n), tradeoff points (T-n), risks (R-n) and non-risks (NR-n).

Important
The requested priority list does not exist

The task asked for "the Q4.9 quality-goal ranking" to be used as the priority list. Q4.9 is an [OPEN] leaf of QUESTION_TREE.adoc and is unanswered in OPEN_QUESTIONS.adoc — its answer block still reads (write here). There is no team-supplied ranking.

This evaluation therefore proceeds on a provisional ranking (Provisional Quality-Goal Priority) derived, as the arc42 documents themselves do, from code-visible emphasis. Every prioritisation in this report — the utility tree weights, the scenario ordering, and which tradeoff is judged "acceptable" — inherits this deferral. This is recorded as the master risk [R-0]; individual scenarios that depend on it carry a ⚑ Deferred flag.

Provisional Quality-Goal Priority

Q4.9 itself notes: "the README theme 'Safety first' and the permission system suggest Security/Usability are prioritised, but the code cannot rank the eight ISO 25010 characteristics." The arc42 Chapter 1.2 names five top goals; Chapter 10 marks three as derived. The provisional order below is used only so the ATAM can proceed — it is not authoritative.

Rank Quality Goal Basis (provisional)

1

Security / Safety

"Safety first" README theme; permission system is the most elaborated subsystem (arc42 §8.2).

2

Usability

Q4.9 pairs Usability with Security; TUI, onboarding, autocompletion.

3

Reliability

Retry, auto-compaction, clean cancellation (arc42 §4, §6).

4

Maintainability

Strict CI gates, hexagonal seams (arc42 §2, ADR-001).

5

Compatibility

Multi-provider, MCP, ACP (arc42 §4, ADR-002).

6–8

Performance Efficiency, Functional Suitability, Portability

Chapter 10 "derived" characteristics.

Business Drivers

The system is a client-side CLI coding agent that executes shell commands and edits files on a developer’s machine on behalf of an LLM (arc42 §1, §3). The dominant business tension is therefore intrinsic: the agent is useful in proportion to how much it may do without asking, and dangerous in the same proportion. The two named tradeoffs — autonomy vs. safety and speed vs. correctness — are both expressions of that tension.

Which user segment the architecture should favour (interactive, programmatic, or ACP) is [OPEN] — Q1.2.2 / Q1.6 — so the relative weight of the interactive and the unattended (CI) scenarios below cannot be settled. This is risk [R-0].

Architectural Approaches Examined

ID Approach Source

AP-1

Hexagonal ports and adapters

ADR-001

AP-2

Pluggable LLM backend factory + API-style adapters

ADR-002

AP-3

Conversation middleware pipeline (turn/price/compaction)

ADR-003

AP-4

File-based session persistence (folder + JSONL)

ADR-004

AP-5

Tiered tool permissions + agent safety profiles

arc42 §8.2, BR-3/BR-4

AP-6

Trust-folder gate

arc42 §8.2, BR-1

AP-7

Working-directory boundary for file tools

arc42 §8.2, T-003

AP-8

Retry/backoff + auto-compaction recovery

arc42 §8.5, §6

AP-9

Concurrent (asyncio) tool execution within a turn

arc42 §6

Quality-Attribute Utility Tree

Each leaf is a scenario, tagged (importance, architectural risk) on a High/Medium/Low scale. Both axes are provisional per [R-0].

Scenario Analysis

Each scenario gives stimulus, environment, response and measure, then the architectural decision it exercises and the S/T/R points it reveals. ⚑ Deferred marks a scenario whose evaluation rests on an unanswered Question-Tree leaf.

Autonomy vs. Safety

SC-1 — Destructive command in an unattended CI run ⚑ Deferred

Stimulus

The LLM emits a destructive shell command (e.g. a recursive delete) during vibe -p in a CI pipeline.

Environment

Programmatic mode, which forces the auto-approve agent (BR-5).

Response

There is no human approval prompt. Only the bash allow/deny prefix lists and arity checks stand between the LLM and execution (arc42 §8.2); sudo always asks, but a non-sudo destructive command is not on any deny list by default.

Response measure

Undefined — no arc42 quality scenario measures the fraction of destructive commands the allow/deny list blocks.

Decision

AP-5 + BR-5.

Analysis

This is the sharpest autonomy-vs-safety tradeoff in the system. Programmatic mode trades the entire human safety gate for unattended throughput. The arc42 Security concept (§8.2) lists "Tiered tool permissions" as the mitigation for threat T-001, but in auto-approve the ASK tier collapses to ALWAYS — the mitigation is disabled by the very mode that most needs it.

Flag

Rests on Q3.8.1 (no STRIDE threat model — the residual risk of T-001 cannot be claimed bounded) and Q4.9 (whether Safety outranks the unattended-throughput use case is unranked). → [T-1], [R-1].

SC-2 — Calibrated autonomy in interactive accept-edits

Stimulus

An interactive developer selects the accept-edits profile; the LLM proposes a file edit and, separately, a shell command.

Environment

Interactive TUI.

Response

The edit auto-applies without a prompt; the mutating shell command still resolves to ASK and prompts (arc42 §8.2, BR-4).

Response measure

Edits applied with zero prompts; mutating bash prompts 100 % of the time.

Decision

AP-5 (agent safety profiles).

Analysis

This is the architecture’s designed middle of the autonomy-safety spectrum, and it is sound: the per-tool permission tier is orthogonal to the per-profile autonomy level, so the profile can relax edits without relaxing shell execution. → [NR-1], [S-1], [S-2].

SC-3 — First entry into an untrusted cloned repository

Stimulus

A developer cd`s into a freshly cloned repository whose `.vibe/ ships hostile tool, hook and agent definitions.

Environment

Interactive TUI, first run in that folder.

Response

The trust-folder gate blocks loading of project config until the developer explicitly accepts (arc42 §8.2, BR-1); the decision persists in ~/.vibe/trusted_folders.toml.

Response measure

Untrusted project config loaded: never. One trust prompt per new folder.

Decision

AP-6.

Analysis

The gate cleanly closes threat T-005. The cost is one usability prompt per new folder — a small, bounded friction.

Flag

Mildly rests on Q2.6.BR.intent — whether the trust gate is a hard security guarantee or a convenience is deferred, so its strength cannot be asserted. → [NR-2], [S-6].

SC-4 — File write aimed outside the working directory ⚑ Deferred

Stimulus

The LLM issues a write_file / search_replace targeting a path outside the working directory.

Environment

Any session.

Response

The working-directory boundary forces an extra approval (arc42 §8.2, T-003); --add-dir widens the permitted root set.

Response measure

Out-of-boundary writes without approval: none — provided the boundary is a guarantee.

Decision

AP-7.

Analysis

The boundary is an approval trigger, not a hard block — and --add-dir plus the "extra approval" escape hatch mean an autonomous profile can still write widely.

Flag

Rests on Q2.6.BR.intent ("is the workdir boundary a security guarantee or a convenience?", explicitly deferred) and Q3.8.1. Until answered, SC-4’s response cannot be classified as safe or unsafe. → [T-1], [R-2].

Speed vs. Correctness

SC-5 — Concurrent edits to the same file in one turn

Stimulus

In a single turn the LLM emits two search_replace calls that touch the same file.

Environment

Any session.

Response

The Agent Loop runs tool calls as concurrent asyncio tasks (arc42 §6, Conversation Turn). The arc42 runtime view documents the concurrency but documents no serialization of file writes; the outcome is interleaving-dependent.

Response measure

None — arc42 §6 covers only the happy path; there is no scenario for concurrent-write conflict.

Decision

AP-9.

Analysis

Concurrency buys turn speed (the §10 quality tree credits "concurrent tools" to Performance Efficiency) at a correctness cost the documentation does not bound. Two patches to one file can lose an edit or apply against stale content. → [T-3], [R-3].

SC-6 — Context auto-compaction on a long session ⚑ Deferred

Stimulus

An interactive session’s context_tokens crosses the model’s auto_compact_threshold.

Environment

Long-running interactive session.

Response

AutoCompactMiddleware returns COMPACT; the loop summarises history into [system, summary] and forks a new session id (arc42 §6).

Response measure

Continuity preserved (the session does not fail); fidelity of the summary is not measured anywhere in arc42 §10.

Decision

AP-3 + AP-8.

Analysis

Compaction trades correctness (detail in the dropped history) for reliability/continuity (the session survives). The §10 scenario gives the trigger as a symbolic auto_compact_threshold with a 50 % warning, but no measure of how much task-relevant context survives.

Flag

The acceptability of the fidelity loss depends on the Reliability -vs- Functional-Suitability ranking, which is Q4.9 (deferred). → [T-2], [S-3].

SC-7 — --max-price ceiling overshoot

Stimulus

A programmatic run is bounded with --max-price; the session accumulates cost.

Environment

Programmatic mode.

Response

PriceLimitMiddleware stops the loop when session_cost exceeds the ceiling — but session_cost is a self-described rough estimate that ignores prompt caching (arc42 §11, TD-001).

Response measure

The --max-price guarantee is approximate; arc42 §11 states this explicitly. The divergence size is unmeasured.

Decision

AP-3.

Analysis

Speed of unattended operation (no human watching the bill) is bought against the correctness of the cost bound. The run can stop early (wasting budget) or late (overspending). → [R-4], [S-4].

SC-8 — Byte-capped read of a large file

Stimulus

The LLM reads a file larger than the read_file cap.

Environment

Any session.

Response

Content is returned truncated at 64 000 bytes with was_truncated set (arc42 §10).

Response measure

Read cap 64 000 bytes/call; grep 100 matches / 64 000 bytes; bash 16 000-byte output cap.

Decision

Bounded-I/O caps (arc42 §10).

Analysis

The caps bound turn latency and context growth (speed) but hand the LLM partial input (correctness) — and was_truncated is the only signal that this happened. A reasoning error built on a truncated read is silent. → [T-5], [S-5].

Reliability, Maintainability, Compatibility

SC-9 — Transient provider rate-limit, then a sustained outage ⚑ Deferred

Stimulus

The LLM provider returns 429s, then becomes unavailable for an extended period.

Environment

Any session.

Response

The Mistral backend retries with exponential backoff — 500 ms initial, 1.5× exponent, 300 s cap (arc42 §8.5, §10). Behaviour after retry exhaustion / on a sustained outage is not specified (arc42 §11, R-009).

Response measure

A single turn can block up to the 300 s cap; the sustained-outage end-state has no measure.

Decision

AP-8.

Analysis

The retry cap is a reliability-vs-performance tradeoff — a hung turn costs up to 300 s of latency.

Flag

The sustained-outage end-state rests on Q5.3 (deferred to Operations). → [T-4], [R-5].

SC-10 — A contributor adds a new LLM provider

Stimulus

A contributor integrates a new provider.

Environment

Development / CI.

Response

A new backend is registered in BACKEND_FACTORY or covered by a new API-style adapter behind the BackendLike port; the Agent Loop is unchanged (ADR-002).

Response measure

Agent-loop files changed: zero.

Decision

AP-1 + AP-2.

Analysis

Hexagonal ports plus the backend factory deliver Compatibility and Maintainability with no opposing attribute materially harmed — ADR-002’s only negative is config complexity (-1). A clean non-risk. → [NR-3].

SC-11 — Editor integration over ACP

Stimulus

An IDE integrator embeds the agent via vibe-acp.

Environment

ACP server over stdio.

Response

The ACP bridge exposes the engine; ACP-specific tools delegate file/terminal operations to the ACP client (arc42 §5, §6).

Response measure

One engine reused behind two front ends.

Decision

AP-1 (ports) + the ACP bridge.

Analysis

Compatibility is delivered, but acp/tools/ re-implements several core/tools/ builtins (arc42 §11, TD-002) — a Compatibility-vs -Maintainability tradeoff: the protocol reach costs duplicated tool maintenance. → [T-6], [R-6].

Consolidated Findings

Sensitivity Points

ID Sensitivity Point Attributes sensitive to it

S-1

The permission tier (ALWAYS/ASK/NEVER) assigned to each tool

Security, Usability

S-2

The active agent safety profile

Security, Performance (autonomy)

S-3

The auto_compact_threshold value

Reliability, Functional Suitability (correctness)

S-4

The retry parameters (500 ms / 1.5× / 300 s cap)

Reliability, Performance (latency)

S-5

The tool I/O byte caps (16 k / 64 k / 100 k)

Performance, Functional Suitability

S-6

The trust-folder gate

Security, Usability (first-run friction)

Tradeoff Points

ID Tradeoff Point

T-1

Agent profile + permission tiers (AP-5). Autonomy/throughput (Performance, Usability) vs. Safety (Security). Sharpest at the auto-approve profile — see SC-1 — Destructive command in an unattended CI run ⚑ Deferred, SC-4 — File write aimed outside the working directory ⚑ Deferred.

T-2

Auto-compaction (AP-3/AP-8). Continuity/Reliability vs. context fidelity/Correctness — see SC-6 — Context auto-compaction on a long session ⚑ Deferred.

T-3

Concurrent tool execution (AP-9). Turn speed/Performance vs. write-correctness — see SC-5 — Concurrent edits to the same file in one turn.

T-4

Retry/backoff cap (AP-8). Reliability vs. Performance (a turn may block up to 300 s) — see SC-9 — Transient provider rate-limit, then a sustained outage ⚑ Deferred.

T-5

Bounded-I/O byte caps. Performance vs. Functional Suitability (truncated input) — see SC-8 — Byte-capped read of a large file.

T-6

ACP tool re-implementation (TD-002). Compatibility/protocol reach vs. Maintainability — see SC-11 — Editor integration over ACP.

Risks

R-0 (master risk) — The priority list is unconfirmed. Q4.9 is [OPEN] and unanswered. Every weight, ordering and "acceptable tradeoff" judgement in this report is provisional. The autonomy-vs-safety tradeoff in particular cannot be resolved without knowing whether Safety outranks unattended throughput — and the segment priority that would inform that (Q1.2.2 / Q1.6) is also [OPEN]. ⚑ Deferred.

R-1 — auto-approve disables the headline safety mitigation. In programmatic mode (BR-5) the ASK tier collapses to ALWAYS, so "Tiered tool permissions" — the §8.2 mitigation for T-001 — is inert in the mode that runs unattended. With no STRIDE model (Q3.8.1) the residual risk is unquantified. ⚑ Deferred (Q3.8.1, Q4.9). See SC-1 — Destructive command in an unattended CI run ⚑ Deferred.

R-2 — The working-directory boundary’s strength is undefined. AP-7 is an approval trigger, not a hard block, and --add-dir widens it. Whether it is a security guarantee is explicitly deferred (Q2.6.BR.intent). ⚑ Deferred. See SC-4 — File write aimed outside the working directory ⚑ Deferred.

R-3 — Concurrent same-file edits have no documented serialization. arc42 §6 documents concurrency but no write-conflict handling; the outcome of two patches to one file is interleaving-dependent. See SC-5 — Concurrent edits to the same file in one turn.

R-4 — The --max-price ceiling is approximate. session_cost ignores prompt caching (TD-001), so a cost-bounded autonomous run can overshoot or stop early. See SC-7 — --max-price ceiling overshoot.

R-5 — Sustained-outage behaviour is unspecified. Beyond the 300 s retry cap the end-state is undefined (R-009 / Q5.3). ⚑ Deferred. See SC-9 — Transient provider rate-limit, then a sustained outage ⚑ Deferred.

R-6 — ACP duplicates core tools. TD-002: acp/tools/ re-implements core/tools/ builtins; a tool fix must be made twice. See SC-11 — Editor integration over ACP.

Non-Risks

NR-1 — Orthogonal profile / permission design. The per-tool tier and the per-profile autonomy level are independent, so accept-edits can relax edits without relaxing shell execution (SC-2 — Calibrated autonomy in interactive accept-edits).

NR-2 — The trust-folder gate. A clean, bounded mitigation for T-005 at a one-prompt usability cost (SC-3 — First entry into an untrusted cloned repository) — subject only to the mild Q2.6.BR.intent caveat.

NR-3 — Hexagonal ports + backend factory. Compatibility and Maintainability with no opposing attribute materially harmed (SC-10 — A contributor adds a new LLM provider).

Scenarios Resting on a Deferred Question

Scenario Deferred leaf Effect on the evaluation

SC-1 — Destructive command in an unattended CI run ⚑ Deferred

Q3.8.1, Q4.9

Residual T-001 risk cannot be bounded; whether the autonomy gain justifies it cannot be ranked.

SC-4 — File write aimed outside the working directory ⚑ Deferred

Q2.6.BR.intent, Q3.8.1

The boundary cannot be classified as a guarantee or a convenience.

SC-6 — Context auto-compaction on a long session ⚑ Deferred

Q4.9

The acceptability of compaction’s fidelity loss depends on the Reliability-vs-Functional-Suitability rank.

SC-9 — Transient provider rate-limit, then a sustained outage ⚑ Deferred

Q5.3

The sustained-outage end-state is undefined.

All scenarios

Q4.9, Q1.2.2, Q1.6 (via [R-0])

The priority order, utility-tree weights and every "acceptable" judgement are provisional.

Conclusion

The architecture’s central decision — the tiered permission system plus agent safety profiles (AP-5) — is a well-formed tradeoff mechanism: SC-2 — Calibrated autonomy in interactive accept-edits shows the orthogonal tier/profile design lets autonomy be tuned without dismantling safety. The serious finding is not the mechanism but its unattended configuration: programmatic mode forces auto-approve (BR-5), which inverts the mechanism by collapsing every ASK to ALWAYS (SC-1 — Destructive command in an unattended CI run ⚑ Deferred, [R-1]).

That finding cannot be closed by architecture work alone. It rests on two unanswered questions — whether Safety is the top quality goal (Q4.9) and whether there is a threat model to bound the residual risk (Q3.8.1) — and on a third, the segment priority (Q1.2.2/Q1.6), that decides how much the unattended use case should weigh at all.

The single highest-value action is not a code change: it is answering Q4.9, Q3.8.1 and Q2.6.BR.intent. Until then this ATAM, like the architecture documentation it evaluates, can only describe the tradeoffs — it cannot adjudicate them.

Findings are recorded for disposition; no fixes were applied.