ADR-004: FlatVectorStore over dedicated vector DB
ADR-004: FlatVectorStore over dedicated vector DB
Date: 2025-02 Status: Accepted
Context
SideCar’s semantic retrieval layer (symbol-level PKI search, episodic memory, prose chunking) needs a vector store. The options were:
- Dedicated vector DB (LanceDB, Chroma, Qdrant, Weaviate): rich query APIs, disk persistence, approximate nearest-neighbor indexes. All require native binaries or a running process — significant extension-host overhead.
@huggingface/transformers+ in-process store: embed in the Node.js extension host using ONNX Runtime Web. No external process. Requires a pure-JS/WASM store.FlatVectorStore: brute-force cosine scan over aFloat32Arraymatrix. O(n) per query. Simple, zero deps beyond the embedding model.
The embedding model chosen was MiniLM-L6-v2 (384 dimensions, ~22MB ONNX). At the expected scale (≤50k symbols per workspace, ≤1k episodic entries per session), O(n) scan is fast enough: a 50k-symbol scan completes in ~30ms on a modern laptop.
Decision
SideCar uses FlatVectorStore<M> (src/config/vectorStore.ts) as the sole vector backend. The VectorStore<M> interface is defined so a future LanceDB adapter can drop in without changing callers.
Persistence is via persist()/restore() methods that write/read a JSON file. The sidecarDir parameter accepts null for session-only stores (episodic memory) that don’t need persistence.
A Merkle tree layer (src/config/merkleTree.ts) sits above the flat store to prune the candidate set before scoring: subtrees whose aggregated embedding is far from the query are skipped, reducing the effective scan to ~5–15% of the total corpus on typical queries.
Consequences
Positive:
- Zero native binary dependencies — extension installs cleanly on all platforms without a post-install build step
- Single in-process store — no IPC, no process management, no port conflicts
VectorStore<M>interface means the decision is reversible; LanceDB backend reserved in config (sidecar.projectKnowledge.backend: 'flat' | 'lance')
Negative:
- O(n) scan degrades on very large codebases (>200k symbols). The Merkle pruning layer partially mitigates this but is not a substitute for approximate nearest-neighbor at scale.
- No HNSW or IVF index — cannot support the approximate recall + speed tradeoffs that production RAG systems expect
- WASM ONNX Runtime adds ~15MB to the extension bundle; first-embed call is slow (~1–2s) due to WASM JIT compilation