ADR-001: Local-first architecture via Ollama
ADR-001: Local-first architecture via Ollama
Date: 2024-09 Status: Accepted
Context
Agent-mode coding assistants (Copilot, Cursor, Claude Code) all require paid cloud API access. A VS Code extension that wires directly into a local LLM server can deliver equivalent capability at zero marginal cost. The primary target backend was Ollama, which runs on consumer hardware and exposes an OpenAI-compatible HTTP API at localhost:11434.
The decision also needed to account for developers who work air-gapped, handle sensitive codebases, or simply want no data leaving their machine.
Decision
SideCar is local-first: Ollama is the default backend, and every feature must work without internet access. Cloud backends (Anthropic, OpenAI, OpenRouter, Groq, Fireworks, Kickstand) are supported as optional overlays through the same ApiBackend interface, selectable via sidecar.baseUrl. No feature may silently require a cloud call when the user is on a local backend.
The ApiBackend abstraction (src/ollama/backend.ts) keeps the agent loop backend-agnostic: streamChat(model, systemPrompt, messages, signal, tools) is the only method the loop calls.
Consequences
Positive:
- Zero subscription cost for users on Ollama
- Full offline operation — no telemetry, no data egress
- Easy to test locally; no API keys needed for development
- Broadens audience beyond developers who can expense cloud credits
Negative:
- Local model capability ceiling is lower than frontier models for complex tasks; the extension must be robust to weaker tool-calling behavior (fallback text parsers, cycle detection, stub guards)
- Ollama’s API surface differs from Anthropic’s in ways that require per-backend adapters (prompt caching, thinking blocks, structured tool schemas)
- Feature gating by provider (e.g.
delegate_taskonly on paid backends, extended thinking only on Anthropic) adds complexity to tool registration