Agent Loop Architecture
Agent Loop Architecture
The agent loop is the core iteration engine that drives every SideCar agentic interaction. It lives in src/agent/loop.ts as a thin 255-line orchestrator that reads top-to-bottom as one iteration’s pseudo-code. Every meaningful chunk of logic is delegated to a single-purpose helper under src/agent/loop/.
One iteration at a glance
flowchart TD
Start([runAgentLoop]) --> Init[initLoopState<br/>bundle state, config, tools]
Init --> Bus[HookBus setup<br/>defaultPolicyHooks +<br/>regressionGuardHooks +<br/>extraPolicyHooks]
Bus --> Loop{iteration <<br/>maxIterations?}
Loop -- no --> Finalize[finalize<br/>emit done, suggestions]
Finalize --> Return([return messages])
Loop -- yes --> Abort{signal.aborted?}
Abort -- yes --> Finalize
Abort -- no --> Compress[applyBudgetCompression<br/>pre-turn]
Compress --> Exhausted{exhausted?}
Exhausted -- yes --> BudgetBreak[emit budget warning] --> Finalize
Exhausted -- no --> Notify[notifyIterationStart<br/>maybeEmitProgressSummary<br/>shouldStopAtCheckpoint]
Notify --> Checkpoint{user stops<br/>at checkpoint?}
Checkpoint -- yes --> Finalize
Checkpoint -- no --> Stream[streamOneTurn<br/>SSE + tool_use parsing]
Stream --> Terminated{terminated?}
Terminated -- timeout --> TimeoutMsg[emit timeout] --> Finalize
Terminated -- aborted --> Finalize
Terminated -- no --> Resolve[resolveTurnContent<br/>strip repeats +<br/>parseTextToolCalls]
Resolve --> HasTools{pendingToolUses<br/>length > 0?}
HasTools -- no --> EmptyHook[hookBus.runEmptyResponse<br/>completion gate, etc.]
EmptyHook --> Mutated{any hook<br/>mutated state?}
Mutated -- yes --> Loop
Mutated -- no --> Finalize
HasTools -- yes --> Burst{exceedsBurstCap?}
Burst -- yes --> Finalize
Burst -- no --> Cycle{detectCycleAndBail?}
Cycle -- yes --> Finalize
Cycle -- no --> PushAsst[pushAssistantMessage]
PushAsst --> Exec[executeToolUses<br/>spawn_agent / delegate_task /<br/>normal dispatch in parallel]
Exec --> Account[accountToolTokens +<br/>pushToolResultsMessage]
Account --> PostCompress[maybeCompressPostTool]
PostCompress --> AfterHook[hookBus.runAfter<br/>auto-fix → stub → critic →<br/>completion-gate tracking]
AfterHook --> PlanMode{approvalMode=plan<br/>&& iter=1?}
PlanMode -- yes --> PlanEmit[emit plan for approval] --> Finalize
PlanMode -- no --> Loop
classDef hookStyle fill:#fef3c7,stroke:#d97706
classDef toolStyle fill:#dbeafe,stroke:#2563eb
classDef terminalStyle fill:#fee2e2,stroke:#dc2626
class Bus,EmptyHook,AfterHook hookStyle
class Stream,Exec toolStyle
class BudgetBreak,TimeoutMsg,Finalize terminalStyle
Submodule map
The orchestrator in loop.ts calls into focused helpers under src/agent/loop/:
| Helper | Responsibility |
|---|---|
state.ts |
initLoopState bundles immutable inputs + mutable accumulators into one LoopState object |
compression.ts |
applyBudgetCompression (pre-turn) + maybeCompressPostTool (after tool results) |
streamTurn.ts |
streamOneTurn owns the streamChat request loop with per-event timeout + abort handling; captures partial text for /resume on mid-stream failure |
textParsing.ts |
resolveTurnContent → parseTextToolCalls + stripRepeatedContent for models that emit tool calls as text (qwen3, Hermes) instead of structured tool_use |
cycleDetection.ts |
exceedsBurstCap (max tools per iteration) + detectCycleAndBail (ring buffer of recent tool+args tuples) |
messageBuild.ts |
pushAssistantMessage + pushToolResultsMessage + accountToolTokens — single source of truth for message-array mutation |
executeToolUses.ts |
Parallel tool dispatch; special-cases spawn_agent + delegate_task; threads cwdOverride into every ToolExecutorContext |
policyHook.ts |
HookBus + PolicyHook interface. Hooks fire via runAfter (post-tool) and runEmptyResponse (no tool calls this turn) |
builtInHooks.ts |
defaultPolicyHooks() wraps the four built-ins as PolicyHook adapters |
criticHook.ts |
Adversarial critic — spawns a second LLM call to review the agent’s edits; can push a synthetic user message demanding more work |
gate.ts |
Completion gate — refuses to let the agent end the turn without running lint/tests when it claims to be done |
stubCheck.ts |
Post-tool validator that rejects placeholder code (TODO, // implement me, …) |
notifications.ts |
notifyIterationStart + maybeEmitProgressSummary + shouldStopAtCheckpoint (user interrupt every N iterations) |
finalize.ts |
Post-loop teardown + next-step suggestion synthesis |
Hook bus ordering
The HookBus runs hooks in registration order:
- Built-ins (auto-fix → stub validator → critic → completion-gate tracking) — registered first via
defaultPolicyHooks(). - Regression guards — loaded from
sidecar.regressionGuardsconfig, gated behindcheckWorkspaceConfigTrust. - User extras —
options.extraPolicyHooksregistered last. These see every mutation earlier hooks made tostate.messages.
Two hook phases:
afterToolResults(hookBus.runAfter) — fires after every successful tool-execution turn. Hooks may push synthetic user messages that demand more work.emptyResponse(hookBus.runEmptyResponse) — fires when the model produced no tool calls. Any hook that mutates state keeps the loop alive; if none mutate, the loop naturally terminates.
Critic × Completion Gate interaction
Both the adversarial critic and the completion gate can push synthetic user messages into state.messages. They look superficially similar — both are post-turn policies that might keep the loop alive — but they fire in different phases and at different moments in the turn:
| Hook | Phase | Fires when | Can inject? |
|---|---|---|---|
autoFix |
afterToolResults |
Lint / build / test errors detected post-edit | ✅ |
stubValidator |
afterToolResults |
Placeholder code (TODO, // implement me) detected in the write |
✅ |
adversarialCritic |
afterToolResults |
After write_file / edit_file / failed run_tests |
✅ |
completionGate (tool recording) |
afterToolResults |
Every turn — feeds gate state with tool uses | ❌ never |
completionGate (gate check) |
emptyResponse |
Model tried to terminate without verifying edits | ✅ |
Because the critic fires in afterToolResults and the gate’s injecting method (onEmptyResponse) fires in emptyResponse, they cannot both inject on the same turn — those are mutually exclusive branches in loop.ts. The gate’s afterToolResults method runs on every turn but is purely for tool-use tracking; it never pushes a message.
What does happen across successive iterations is more subtle:
sequenceDiagram
participant Agent
participant Critic
participant Gate
participant Runner
Note over Agent,Runner: Turn 1 - agent edits foo.ts
Agent->>Runner: emit edit_file foo.ts
Runner->>Critic: afterToolResults - edit trigger
Critic-->>Runner: inject "fix null check"
Note over Gate: records edit, does NOT inject
Runner->>Agent: Turn 2
Note over Agent,Runner: Turn 2 - agent addresses critic
Agent->>Runner: emit edit_file foo.ts
Runner->>Critic: afterToolResults - per-file cap 1 of 2
Critic-->>Runner: inject "new bug introduced"
Runner->>Agent: Turn 3
Note over Agent,Runner: Turn 3 - agent edits again, hits critic cap
Agent->>Runner: emit edit_file foo.ts
Runner->>Critic: afterToolResults - per-file cap 2 of 2, SKIP
Note over Critic: no injection, per-file cap reached
Runner->>Agent: Turn 4
Note over Agent,Runner: Turn 4 - agent believes it is done
Agent->>Runner: emit no tool calls
Runner->>Gate: onEmptyResponse
Gate-->>Runner: inject "run lint + tests before completion"
Runner->>Agent: Turn 5
Note over Agent,Runner: Turn 5 - agent runs tests, they fail
Agent->>Runner: emit run_tests, fails
Runner->>Critic: afterToolResults - test_failure trigger, UNBOUNDED
Critic-->>Runner: inject "root cause analysis"
Note over Runner: continues until maxIterations or another cap trips
Bounds that prevent infinite loops
- Critic — per-file injection cap.
MAX_CRITIC_INJECTIONS_PER_FILE = 2insrc/agent/loop/criticHook.ts— after two critic blocks on the same file within a run, the critic skips further blocks for that file. Applies toedittriggers only. - Critic — test-failure triggers are NOT per-file-capped. The per-file counter is keyed by
filePath, andtest_failuretriggers don’t name a single file. A gate-forced test run that keeps failing can keep firing the critic turn after turn. In practice this is bounded by the outer iteration cap; in the worst case you burn critic calls (Haiku: ~$0.02 each on Anthropic backends) untilmaxIterationstrips. - Gate — total injection cap.
MAX_GATE_INJECTIONS = 2insrc/agent/loop/gate.ts. After two gate reprompts in a run, the gate logs a warning and allows termination with unverified edits rather than looping forever. - Loop — iteration cap.
sidecar.agentMaxIterations(default 25). Ultimate backstop. - Cycle detection. Same tool+args tuple repeated N times triggers
detectCycleAndBail. - Burst cap. Too many tools attempted in one iteration triggers
exceedsBurstCap.
Known lockup-risk scenario
The realistic worst case is gate → test failure → critic loop:
- Agent edits, gate’s per-file critic cap exhausts (2 blocks).
- Agent tries to terminate; gate injects “run tests.”
- Tests fail; critic
test_failuretrigger fires (unbounded per-file). - Agent edits to fix, but edit triggers are now capped for these files.
- Repeat steps 2–4 until
MAX_GATE_INJECTIONS(2) is hit ormaxIterations(25) trips.
You can burn 20+ iterations and 10+ critic calls before the outer cap fires. For a user on Sonnet with critic defaulting to Haiku, that’s ~$1–2 of API spend on a single stuck turn.
Escape hatches for a stuck loop
- Abort via the chat UI (cancel button) —
signal.abortedis checked between iterations and the loop exits immediately. - Disable the critic for the session:
sidecar.critic.enabled: false(or toggle via the settings UI’s new “SideCar: Safety & Review” section). - Disable the gate:
sidecar.completionGate.enabled: false. - Lower
sidecar.agentMaxIterationsto cap spend per run. - Inspect
SideCar: Show Session Spend— the critic session-stats view added in v0.62.1 showsblockedTurns+lastBlockedReasonso you can tell whether the critic is what’s looping.
Why the test-failure trigger is unbounded
It’s a deliberate design trade, not an oversight: a test that keeps failing for different reasons across iterations is exactly the situation where the critic’s analysis is most valuable. Bounding test-failure triggers per-file would mute the critic precisely when an agent is flailing. The unbounded behavior is bounded-enough-in-practice by maxIterations + MAX_GATE_INJECTIONS. A future improvement would be a per-test-output hash cap (don’t re-fire the critic when the test output hasn’t materially changed) — tracked as an open item.
Prompt pruner safety model
The promptPruner runs in the backend layer on every request to a paid backend (Anthropic / OpenAI-compatible). It’s a lossy-but-bounded transform that protects against three scenarios:
- Oversize tool_result blocks (e.g., a
read_fileon a 100KB file) that would crowd out the conversation history. - Duplicate reads of the same file across a turn (agent reads foo.ts three times).
- Trailing whitespace runs from tool outputs that pad the prompt without carrying signal.
Three transforms, one contract
flowchart LR
Msgs[messages array] --> W[collapseWhitespace<br/>3+ blank lines → 2]
W --> T[truncateToolResult<br/>head 60% + tail 40% +<br/>elision marker]
T --> D[dedupeToolResults<br/>same content → back-reference<br/>EXCEPT exempt tools]
D --> Send[send to backend]
classDef safeStyle fill:#dcfce7,stroke:#16a34a
classDef cautionStyle fill:#fef3c7,stroke:#d97706
class W,T safeStyle
class D cautionStyle
The contract is: the pruner NEVER touches user message text, assistant reasoning, or tool_use inputs. It only transforms tool_result blocks and whitespace in between. This keeps the pruner safe to enable by default (sidecar.promptPruning.enabled: true).
Which transforms apply to which tools
collapseWhitespace— applied universally. Runs of 3+ blank lines become 2. No user-visible impact; preserves all non-whitespace signal.truncateToolResult— applied to every tool_result block that exceedssidecar.promptPruning.maxToolResultTokens(default ~4K tokens). Head + tail + elision marker preserves the error signal at the top AND the failing line at the bottom — which is where the signal lives in 90% of tool output (compile errors, test failures, file reads where the function of interest is near a function signature).-
dedupeToolResults— applied to most tool_result blocks, but EXEMPT for a specific set:const DEDUP_EXEMPT_TOOLS = new Set([ 'read_file', // agent reads foo.ts, edits it, re-reads it — MUST see new content 'get_diagnostics', // lint/type errors change after edits 'git_diff', // diff vs. HEAD changes after every stage/commit 'git_status', // working-tree state changes after every tool call ]);These tools’ outputs are expected to vary across consecutive calls with identical inputs. Dedup’ing them with a back-reference (“identical to previous tool_result”) is a trap: the agent gets the stale content by reference even though it wrote a newer version. The audit finding that added this exemption (v0.62.1 p.2b) caught exactly this trap in an eval scenario where the agent couldn’t tell its own edit had landed.
Truncation still applies to exempt tools — size management is always legitimate; the exemption is only about the back-reference shortcut.
How to decide whether to exempt a new tool from dedup
Check both questions. Answer “yes” to either → add to DEDUP_EXEMPT_TOOLS:
- Does this tool’s output vary meaningfully across consecutive calls with identical inputs?
- Yes for
read_file(file contents change),list_directory(may change if the agent creates files),git_status(tree state mutates),get_diagnostics(errors resolve). - No for
search_fileswith a fixed query (should be stable; dedup is safe).
- Yes for
- Would collapsing identical outputs into a back-reference lose user intent the agent needs?
- Yes for diff-like tools where the agent is tracking changes over time.
- No for most read-only knowledge tools (
web_search,find_references) — if the agent re-runs the same search, dedup is fine.
When in doubt, lean toward exempting. The cost of a false exemption is a few extra bytes in the prompt; the cost of a false dedup is an agent that can’t see its own work.
Truncation safety by tool
Unlike dedup, truncation has no exempt list. Every oversize tool_result flows through truncateToolResult and gets the head+tail transform regardless of which tool produced it. The head+tail strategy works because most tool output is “signal at the top, signal at the bottom, filler in the middle.” That’s usually true — but it’s a shape-of-output assumption, and it breaks in specific ways per tool.
flowchart LR
subgraph friendly ["Truncation-friendly — signal clusters at head/tail"]
direction TB
F1[run_command / run_tests<br/>stderr header + exit-code tail]
F2[get_diagnostics<br/>severity-sorted, first errors most actionable]
F3[read_file<br/>if agent knows line range]
F4[git_log / git_diff<br/>newest commits / hunks at head]
end
subgraph hostile ["Truncation-hostile — signal scattered through the middle"]
direction TB
H1[grep<br/>matches distributed throughout file]
H2[search_files<br/>relevance-ranked, not position-sorted]
H3[web_search<br/>results 3-8 often better than 1-2]
H4[project_knowledge_search<br/>cosine-ranked hits interleaved with graph-walk results]
H5[list_directory<br/>alphabetical, file of interest mid-listing]
end
classDef safeStyle fill:#dcfce7,stroke:#16a34a
classDef dangerStyle fill:#fee2e2,stroke:#dc2626
class F1,F2,F3,F4 safeStyle
class H1,H2,H3,H4,H5 dangerStyle
Truncation-friendly tools — head+tail captures the actual signal:
| Tool | Why head+tail works |
|---|---|
run_command, run_tests |
Error banner at top; exit code + failing-assertion summary at bottom. Middle = test-runner chatter, build steps, progress bars. |
get_diagnostics |
Errors come back severity-sorted; highest-severity items are in the head. Tail repeats the summary counts. |
git_log, git_diff |
Most recent commit / first hunk at top. Tail varies (final commit / last hunk) but the head carries the “what changed recently” signal. |
read_file with line range |
When the agent passes start_line/end_line, the output is small enough that truncation never fires. Risk only on full-file reads of giant files. |
Truncation-hostile tools — head+tail can drop the match the agent needed:
| Tool | Failure mode |
|---|---|
grep |
Matches distribute throughout the searched file. A 200-match grep result truncated at maxToolResultTokens might elide matches 40–160; the agent sees matches 1–40 and 160–200, which are usually the least interesting (boilerplate imports + trailing tests). |
search_files |
Returns by relevance + freshness; there’s no position-ordering guarantee. The middle of the list can carry the hit the agent actually wants. |
web_search |
Results are ranked by the search provider, but the most actionable result for an error-message lookup is often result 3–5 (Stack Overflow answer, not the marketing page at rank 1). Middle-elision drops those. |
project_knowledge_search |
Hits come back with vector: and graph: relationship labels interleaved. Graph-walk hits (callers/callees 1–2 hops out) often sit in the middle of the ranked list and are the “oh that’s why” evidence for the agent. |
list_directory |
Alphabetical listing. For a directory with 500 files, the file the agent was looking for is statistically not at the first or last 40%. |
Practical guidance:
- Default
maxToolResultTokensis ~4K tokens, which is usually enough to avoid truncation entirely on truncation-hostile tools (most grep / search results fit). If you’re on a small-context model or hit truncation regularly on these tools, raise the cap viasidecar.promptPruning.maxToolResultTokensbefore assuming the agent is “missing” information. - When reading a truncated tool_result, look for the elision marker:
[...N bytes elided by SideCar prompt pruner...]. If the agent is confused after a truncation-hostile tool call, the elided bytes are the first place to look. - For grep / search workloads, prefer narrower queries. A pre-scoped
grep -r "needle" src/auth/is better thangrep -r "needle" .— the narrower result fits under the budget and doesn’t get elided. - Disable pruning for debugging:
sidecar.promptPruning.enabled: falsebypasses truncation entirely. Useful when tracking down “why did the agent ignore result X” — re-run with pruning off and see whether the hit was in the elided region.
Known gap: no per-tool truncation strategy
The current pruner applies one strategy (head+tail with 60/40 split) to every tool. A future improvement would be a per-tool truncation dispatch:
grep/search_files— slice to the top-N matches by a relevance score instead of by position.list_directory— sort-and-truncate by path-relevance to the active file.project_knowledge_search— already has its own top-K in the tool; a per-tool smaller budget would let the pruner drop additional matches past N.
Tracked as an open item. Until then, tuning maxToolResultTokens upward is the escape hatch for workloads that exercise truncation-hostile tools heavily.
Observability
The pruner emits a PruneStats object on every request:
truncatedBytes— total bytes dropped by head+tail truncation.dedupedBytes— total bytes saved by dedup (back-references).whitespaceBytes— total bytes saved by whitespace collapse.truncatedByTool— per-tool breakdown of truncation (e.g.,{ read_file: 12000, run_command: 4000 }).
When any of these is non-zero, formatPruneStats(stats) emits a one-line summary to the SideCar output channel (console.info). Users who see “did the pruner eat my error message?” can look at the log to answer conclusively.
Termination paths
The loop can exit via any of:
- Natural end: model produced no tool calls and no hook wanted to reprompt.
- Plan mode: first iteration completed,
onPlanGeneratedfired for user approval. - Iteration cap:
state.iteration >= state.maxIterations. - Abort:
signal.abortedbetween iterations or mid-stream. - Budget exhaustion:
applyBudgetCompressioncouldn’t fit undermaxTokens. - Burst cap: too many tools attempted in one iteration.
- Cycle detected: same tool+args tuple seen too many times in the recent ring.
- Timeout: a single stream turn exceeded
sidecar.requestTimeout. - Checkpoint refused:
onCheckpointreturnedfalse.
All paths route through finalize(state, callbacks) which emits the final onDone callback and synthesizes next-step suggestions.
Per-run isolation
options.toolRuntime is a per-run ToolRuntime carrying the persistent shell session + symbol-graph reference. BackgroundAgentManager creates a fresh ToolRuntime per run and disposes it in finally so parallel background agents don’t share a shell — two agents both doing cd or export would otherwise trample each other. options.cwdOverride pins every tool call’s working directory, used by Shadow Workspaces to route fs writes into an ephemeral git worktree.