Agent Mode
Agent Mode
SideCar isn’t just a chatbot — it’s an autonomous coding agent. It can read your code, edit files, run tests, diagnose errors, and iterate until the task is done.
The agent loop
When you give SideCar a task, it enters an agent loop:
- Read — examines relevant files, diagnostics, and context
- Plan — decides which tools to use
- Execute — writes/edits files, runs commands
- Verify — checks diagnostics and test results
- Iterate — if something failed, goes back to step 1
The loop continues until the task is complete, the model returns a final response, or safety limits are reached.
Approval modes
Control how much autonomy SideCar has via the agent mode dropdown in the chat header, or the sidecar.agentMode setting:
| Mode | Reads | Writes | Destructive |
|---|---|---|---|
| Cautious (default) | Auto-approve | Confirm | Confirm |
| Autonomous | Auto-approve | Auto-approve | Confirm |
| Manual | Confirm | Confirm | Confirm |
| Review | Auto-approve | Buffered — see below | Confirm |
- Cautious is recommended for most use. File reads happen automatically; writes and edits open a VS Code diff editor showing the proposed changes. Accept/Reject buttons appear both as a VS Code notification (in the editor) and as a chat card — click whichever is more convenient.
- Autonomous lets SideCar work without interruption. Use for trusted tasks where you’ll review changes after.
- Manual requires approval for every tool call, including file reads.
- Review is the safest mode for large multi-file changes. Writes and edits never hit disk during the agent run — they’re buffered into a Pending Agent Changes panel where you can review each file’s diff and accept or discard it before anything is persisted. See the section below.
Diff preview in cautious mode
When the agent proposes a file write or edit in cautious mode:
- VS Code’s diff editor opens immediately, showing the original file on the left and proposed changes on the right
- An “Accept / Reject” notification appears in the editor
- A parallel confirmation card appears in the chat panel
- First click wins — respond via whichever is more convenient
- On Accept, the file is written; on Reject, the tool call is denied and the agent is informed
For edit_file operations, if inline edit (ghost text) is available, edits appear as Tab-to-accept suggestions at the edit location instead.
Review mode — batch diff review
Review mode is designed for multi-file refactors and anything else you’d want to audit before it touches disk. When sidecar.agentMode is set to review:
- Nothing hits disk during the run. Every
write_fileandedit_filecall is captured into an in-memoryPendingEditStorewith the file’s pre-edit content as a revert baseline. - Reads stay consistent with the agent’s view. If the agent later calls
read_fileon a path it has already edited this session, the executor returns the pending content instead of the disk content. The agent sees a coherent picture of its own in-progress work without the user having to approve each individual step. - The Pending Agent Changes panel (in the SideCar activity bar, below the chat) refreshes live as the store changes. Each entry shows the basename, its workspace-relative directory, an icon indicating “new file” (diff-added) or “modified file” (diff-modified), and a tooltip with the last tool that touched it.
- Clicking any file opens VS Code’s native diff editor via
vscode.diff. The left (baseline) side shows the content captured at the first write, not whatever’s currently on disk — the diff stays stable even if you edit the file outside SideCar while a review is pending. - Resolve pending changes via:
- Inline icons on each row: ✓ to accept (writes the pending content to disk, creating parent directories for new files), ✕ to discard.
- Panel title bar commands: Accept All / Discard All. Discard All asks for modal confirmation since it can’t be undone.
- Command palette:
SideCar: Accept Pending Change,SideCar: Discard Pending Change,SideCar: Accept All Pending Changes,SideCar: Discard All Pending Changes,SideCar: Show Pending Diff.
Session semantics: multiple edits to the same file collapse into a single pending entry with one consolidated before → after diff. The revert baseline is locked on the first capture, so successive edits in the same turn don’t lose the original state. If you run the agent again without resolving previous pending edits, the new edits add to the same store — carry-forward is intentional so you can chain multiple agent turns before reviewing.
v1 limitations (deferred to follow-up work):
- No hunk-level accept/reject — it’s all-or-nothing per file.
- No persistence across VS Code window reloads. If VS Code restarts mid-review, the pending edits are lost.
- No conflict detection. If you modify a file on disk while a review is pending, Accept will overwrite your edits silently.
- No in-chat affordance to toggle into review mode for a single turn — change
sidecar.agentModeor use a custom mode.
Stub validator
After the agent writes or edits files, SideCar automatically scans the output for placeholder patterns:
- TODO/FIXME/HACK comments
- “implement this” / “placeholder” / “stub” comments
- “real implementation” / “actual implementation” deferrals
throw new Error('Not implemented')/raise NotImplementedError- “for now” hedging, “would need” future deferral
- Python
pass-only function bodies, ellipsis-only bodies (...)
If placeholders are detected, SideCar reprompts the model to replace them with complete implementations. This happens automatically (1 retry) and the user sees a message in chat when it fires. Issue tracker references like TODO(#123) are excluded to avoid false positives.
Safety guardrails
| Setting | Default | Description |
|---|---|---|
sidecar.agentMaxIterations |
25 |
Max loop iterations before auto-stop |
sidecar.agentMaxTokens |
100000 |
Max total tokens before auto-stop |
sidecar.agentTemperature |
0.2 |
Temperature for tool-calling requests (lower = more deterministic) |
sidecar.requestTimeout |
120 |
Timeout per LLM request in seconds (0 to disable) |
Additional safety mechanisms:
- Cycle detection — detects repeating tool-call patterns and halts to prevent infinite loops. Length-1 patterns (the same call repeated) require 4 consecutive identical calls to trip, so agents can legitimately re-run a tool to verify after edits or retry tests after fixes without getting cut off. Length 2..4 patterns (A,B,A,B or A,B,C,A,B,C) still trip after two full cycles, since they’re a much clearer signal of a stuck loop.
- Completion gate — forces the agent to verify its work before declaring a task done. On every agent turn, SideCar tracks which files were edited (
write_file/edit_file) and which verification commands ran (run_tests,eslint,tsc,vitest,jest,pytest,npm test). If the agent tries to terminate a turn without having run lint or the colocated tests for a file it edited, a synthetic user message is injected into the loop demanding the checks before the turn can end. Capped at 2 injections per turn — after exhaustion the loop terminates with a warning rather than hanging. Catches the failure mode where the model reports a change as “ready for use” without ever running the checks it claims pass. Toggle withsidecar.completionGate.enabled(default on). - Tool support auto-detection — if a model fails to use tools after 3 attempts, SideCar stops sending tool definitions to avoid wasting context
- Request timeout — if no tokens arrive within
requestTimeoutseconds, the request is aborted with a user-friendly message
Conversation steering
The chat input stays enabled while SideCar is processing. You can:
- Type and send a new message at any time — this aborts the current agent run and starts a new one with your message
- Press Escape to abort the current run (equivalent to clicking Stop)
- Click the Send/Stop button — shows “Stop” when the input is empty (aborts), switches to “Send” when you start typing
This lets you steer the agent mid-run without waiting for it to finish. The backend cleanly aborts the previous loop before processing your new message.
Built-in tools
SideCar has 62 built-in tools the agent can use. The table below covers the core agentic tools; additional specialized tools are available for databases, vision verification, PDF/Zotero research, notebook mode, dependency scanning, and doc-to-test synthesis (some gated by feature flags). delegate_task is only exposed when the active backend is paid.
| Tool | Description |
|---|---|
read_file |
Read file contents |
write_file |
Create or overwrite files |
edit_file |
Search/replace edits in existing files |
search_files |
Glob pattern file search |
grep |
Content search with regex |
run_command |
Execute shell commands in a persistent session (env/cwd persist, background support) |
list_directory |
List directory contents |
get_diagnostics |
Read compiler errors, warnings, and security findings |
run_tests |
Run test suites with auto-detection |
spawn_agent |
Spawn parallel sub-agents for complex tasks (max depth: 3, max 15 iterations each) |
delegate_task (paid backends only) |
Offload read-only research to a local Ollama worker — see below |
git_status |
Show working tree status |
git_stage |
Stage files for commit |
git_commit |
Create a commit |
git_log |
View commit history |
git_push |
Push to remote |
git_pull |
Pull from remote |
git_branch |
Create/switch/list branches |
git_stash |
Stash/pop changes |
git_diff |
Show file diffs |
find_references |
Find symbol references across the workspace |
web_search |
Search the web via DuckDuckGo |
display_diagram |
Extract and render diagrams from markdown files |
ask_user |
Ask the user a clarifying question with selectable options |
check_dependencies |
Scan package.json / requirements.txt / Cargo.toml / go.mod for outdated and vulnerable deps (OSV API) |
kickstand_list_loras (Kickstand only) |
List LoRA adapters currently attached to a loaded Kickstand model |
kickstand_attach_lora (Kickstand only) |
Attach a LoRA adapter to a loaded Kickstand model without reloading — requires approval |
kickstand_detach_lora (Kickstand only) |
Detach a previously-attached LoRA adapter — requires approval |
Additional tools can be added via MCP servers and custom tools.
Kickstand LoRA tools (new in v0.67.1)
When the active backend exposes a loraAdapters capability (Kickstand does; other backends don’t), three agent tools are available for runtime LoRA management:
kickstand_list_loras(model_id)— read-only inventory. Returns[{ id, path, scale }]for every adapter currently attached to a loaded model.kickstand_attach_lora(model_id, path, scale?)— attach a GGUF adapter at an absolute server-readable path.scaledefaults to1.0; Kickstand accepts 0.0–2.0. Multiple adapters stack on one base. Returns the Kickstand-assigned adapter id in the human-readable summary.kickstand_detach_lora(model_id, adapter_id)— detach by id. Pair withkickstand_list_lorasto find the id.
Use cases: role-shape a model mid-task (attach a Python adapter before touching src/python/**, detach when moving on), A/B test adapter scales without reloading the base, stack multiple domain adapters for polyglot projects. Non-Kickstand backends return a clear “not supported — use switch_backend” message instead of crashing the tool loop.
The attach + detach tools require user approval per call (like git_commit / run_command); listing is read-only. Approval is not mandatory-always the way update_setting is — autonomous-mode users can opt into auto-approve via toolPermissions since LoRA state is ephemeral + undoable. See Kickstand model & adapter management for the palette-only equivalents.
Hybrid delegation — delegate_task
When the active backend is paid (Anthropic, OpenAI), SideCar exposes a delegate_task tool to the orchestrator. This is a hybrid-architecture tool that lets the frontier model offload read-only research to a local Ollama worker so you don’t pay for tokens spent on file reads and greps.
How it works:
- The frontier model calls
delegate_task({task, context?})when it needs to explore the codebase — “find all callers ofauthenticate()”, “summarize how tool execution flows throughsrc/agent/”, “grep for TODO comments related to caching”. - SideCar spawns a fresh
SideCarClientpointed athttp://localhost:11434(or the configuredsidecar.delegateTask.workerBaseUrl) with its own system prompt and a read-only tool subset:read_file,grep,search_files,list_directory,get_diagnostics,find_references,git_*,display_diagram. - The worker runs its own mini agent loop (max 10 iterations) with
autonomousapproval mode and produces a compact structured summary — file paths, symbol names, line numbers, recommendations. - The summary is returned to the orchestrator as the
tool_result. The orchestrator never sees the raw file contents or grep output. - The worker’s token consumption does not count against the orchestrator’s char budget. Local Ollama is free; the paid model only pays for reasoning and synthesis.
Design choices:
- Read-only by design — the worker cannot write files, run commands, or make changes. If the task asks for edits, the worker is instructed to describe what should change and leave the actual edits to the orchestrator.
- Not exposed to the worker — the worker doesn’t know
delegate_taskorspawn_agentexist, so it can’t recursively delegate or spiral. - Hidden from local-only setups — the tool definition is only added to
getToolDefinitions()when the provider isanthropicoropenai. Local Ollama users don’t see a pointless option. - Configurable worker —
sidecar.delegateTask.workerModelpicks which Ollama model runs the worker (default: same as chat). A code-tuned model likeqwen3-coder:30bordeepseek-coder:33bgives the best research results.
Example savings — in a typical “refactor this module” task where the frontier model needs to read 8-12 files to understand the current shape, delegating the reads to a local worker cuts the paid-model input tokens by 60-80% on the first turn, then the worker’s structured summary caches well into subsequent turns.
Toggle via sidecar.delegateTask.enabled (default on). See Cost controls for configuration details.
Tool approval and modal dialogs
When the agent calls a tool that requires approval, the prompt location depends on what the tool does:
- Write tools (
write_file,edit_file) — go through the streaming diff preview. The proposed content opens in VS Code’s native diff editor; accept or reject via an editor notification, a chat card, or keyboard shortcuts. Whichever you click first wins. - Destructive tools (
run_command,run_tests,git_stage,git_commit,git_push,git_pull,git_branch,git_stash) — open a native blocking modal (showWarningMessage({modal: true})) that you must dismiss before the editor responds to anything else. The modal body contains a short title (Allow SideCar to run run_command?) and a detail line with the full input (command: npm test). You can’t miss the prompt while scrolled away from chat, and it can’t auto-dismiss. - Read-only tools (
grep,search_files,find_references,list_directory,get_diagnostics,git_diff,git_status,git_log,git_branch list) — keep the existing inline chat card since they’re quick, non-destructive, and don’t justify a blocking interruption.
The rule of thumb: if it could break something, block the editor until you decide. If it’s just reading data, use the quieter in-chat surface.
Clarifying questions
When SideCar encounters an ambiguous request or multiple valid approaches, it can use the ask_user tool to ask for clarification before proceeding. This presents a card in chat with:
- Selectable option buttons — suggested approaches the user can click to choose
- Custom text input — a free-text field if none of the options fit
The agent loop pauses until the user responds, then continues with the selected approach. This prevents SideCar from guessing wrong on ambiguous tasks.
SideCar also tracks when the assistant asks a question in regular prose (ending with ?). If the user’s next reply is short (under 20 words), SideCar automatically adds context so the LLM understands it’s a response to the question, not a new task.
Tool permissions
Override the approval behavior for specific tools:
"sidecar.toolPermissions": {
"run_command": "ask",
"read_file": "allow",
"write_file": "deny"
}
Values: "allow" (always run), "deny" (always block), "ask" (always confirm). Per-tool permissions override the agent mode.
Diff preview
In cautious mode, write_file and edit_file open VS Code’s built-in diff editor showing the proposed changes side-by-side with the original. You accept or reject via an inline confirmation card.
Undo / rollback
SideCar snapshots every file before modifying it. To revert all changes:
- Press
Cmd+Shift+U/Ctrl+Shift+U - Or type
/undoin the chat
After an agent run, a change summary panel shows all modified files with inline diffs. You can revert individual files or accept all changes.
Custom modes
Define your own agent modes with dedicated system prompts and approval behavior via sidecar.customModes:
"sidecar.customModes": [
{
"name": "architect",
"description": "Design only, no file writes",
"systemPrompt": "Focus on architecture decisions and API design. Propose changes but do not write files — describe what should change and why.",
"approvalBehavior": "manual",
"toolPermissions": { "write_file": "deny", "edit_file": "deny" }
},
{
"name": "debugger",
"description": "Diagnostic mode",
"systemPrompt": "Focus on diagnosing the problem. Read files, check logs, run tests, and explain what's wrong before making changes.",
"approvalBehavior": "cautious"
}
]
Each custom mode has:
- name — identifier shown in the mode dropdown
- description — tooltip text in the dropdown
- systemPrompt — additional instructions injected into the system prompt when active
- approvalBehavior —
"autonomous","cautious", or"manual"(determines tool approval) - toolPermissions — optional per-tool overrides (
"allow","deny","ask") that take priority over globalsidecar.toolPermissions
Select custom modes from the dropdown alongside built-in modes (cautious, autonomous, manual, plan, review). Custom modes appear with an orange badge.
Background agents
Run tasks in parallel without blocking your main conversation using /bg <task>:
/bg Write unit tests for src/utils/parser.ts
/bg Refactor the authentication middleware to use async/await
Each background agent:
- Gets its own independent client instance (no shared state with the main chat)
- Runs in autonomous mode with a 15-iteration cap
- Streams output in real-time to a collapsible dashboard panel below the header
- Can be stopped individually via a stop button
The dashboard shows running, queued, completed, and failed agents with elapsed time and tool call counts. Click the expand button on any agent to see its full output.
Concurrency: Up to 3 background agents run simultaneously (configurable via sidecar.bgMaxConcurrent). Additional tasks are queued and start automatically when a slot opens.
When a background agent completes, a summary is posted to the main chat so you see the result without checking the dashboard.
Typed Sub-Agent Facets (new in v0.66)
Facets are a step up from background agents: named specialists with their own tool allowlist, preferred model, and composed system prompt, dispatched in parallel through the Command Palette. Where /bg spawns a generic autonomous agent, SideCar: Facets: Dispatch Specialists dispatches a specific role — security-reviewer, test-author, latex-writer, etc. — against a task.
Key differences from background agents:
/bg background agent |
Facets | |
|---|---|---|
| Tool surface | Full agent-loop tool registry | Per-facet allowlist (e.g. test-author only gets read + write + run_tests) |
| Model choice | Global sidecar.model |
Per-facet preferredModel (pinned for the run, restored after) |
| System prompt | Standard agent prompt | Facet persona composed on top of orchestrator’s rules |
| Review | Per-agent accept/reject inline | Aggregated batch review with cross-facet file-overlap detection |
| Sandboxing | Optional Shadow Workspace | Forced Shadow Workspace (forceShadow: true, deferPrompt: true) |
| Inter-agent coordination | None | Typed RPC bus (rpc.<peerId>.<method> tools generated per-batch) |
Built-in facets: general-coder · latex-writer · signal-processing · frontend · test-author · technical-writer · security-reviewer · data-engineer. Add project-local facets under <workspace>/.sidecar/facets/*.md or user facets via the sidecar.facets.registry setting.
Multi-facet batches run with bounded parallelism (sidecar.facets.maxConcurrent, default 3) and respect dependsOn edges in the facet registry — a facet with dependsOn: ["general-coder"] starts after general-coder finishes. Each facet’s shadow-captured diff is presented in a single batched review UI instead of firing N separate accept/reject prompts during the run.
See Extending SideCar — Facets for the schema and Slash Commands — Facets for the dispatch flow.
Plan mode
Enable sidecar.planMode to have SideCar generate a plan before executing any tools. You review and approve the plan, then SideCar executes it. Useful for complex tasks where you want to validate the approach first.
Thinking / reasoning
Models that support reasoning output show their thinking in collapsible “Reasoning” blocks:
- Anthropic API: Extended thinking blocks stream automatically
- Ollama models:
<think>...</think>tags (used by qwen3, deepseek-r1) are parsed and displayed as reasoning blocks
Set sidecar.expandThinking to true to show reasoning blocks expanded by default.
Reasoning timeline
Since v0.45.0, reasoning is rendered as a numbered step timeline rather than one growing pre-formatted block. Each thinking block closes out when a tool call starts, so a typical agent turn renders as a sequence of discrete segments:
[1. 🧠 Reasoning — 2.3s]
[2. 🔧 read_file — 180ms]
[3. 🧠 Reasoning — 1.1s]
[4. 🔧 edit_file — 410ms]
[5. 🧠 Reasoning — 0.9s]
Each segment gets a numbered pill in the summary row (purple for reasoning, blue for tools) via a CSS counter on the messages container. Duration badges show elapsed wall-clock time — steps under 500ms hide the badge to reduce visual noise. The segmentation makes it easier to see where the agent spent its time and which reasoning led to which tool call.
The sidecar.expandThinking setting still controls whether reasoning segments are open or collapsed by default; it applies to every segment in the timeline.
Verbose mode
Enable sidecar.verboseMode or type /verbose to see detailed agent internals:
- System prompt — the full assembled prompt shown at the start of each run
- Iteration summaries — elapsed time and token counts per iteration
- Tool selection — which tool was chosen and why
Verbose output appears in yellow-bordered collapsible blocks. Use /prompt to inspect the system prompt at any time.
Progress indicators
During agent runs, the UI shows:
- Step count and max iterations
- Elapsed time
- Token usage
- Animated progress bar and tool execution pulses