Doc Index, Semantic Search & Agent Memory
Doc Index, Semantic Search & Agent Memory
SideCar uses three retrieval systems to improve accuracy and consistency:
- Doc Index — keyword-tokenized paragraph index over README /
docs//wiki/. Fast, cheap, and tuned for human-written prose where exact term matches win. - Semantic Search — ONNX
all-MiniLM-L6-v2embeddings over workspace files with cosine similarity. Tuned for code where embeddings match intent across files that share no keywords. - Agent Memory — persistent pattern store that learns from successful tool invocations and injects relevant memories into future turns.
Note on naming: earlier docs called the Doc Index “RAG”, which was misleading — it’s a keyword paragraph index, not a retrieval-augmented-generation pipeline with embeddings, chunking, and reranking. The Semantic Search feature (below) uses real embeddings over code. A future retriever-fusion layer will merge results from both sources with reciprocal-rank scoring instead of concatenating them.
Semantic Search
SideCar embeds your workspace files using a local ONNX model (all-MiniLM-L6-v2, 384-dimensional) and searches by cosine similarity. This means a query like “authentication logic” finds src/auth/jwt.ts even when there’s no keyword match in the file path or conversation history.
How it works
- Indexing — after the workspace index is built, SideCar downloads the embedding model (~23MB, cached in
.sidecar/cache/models/) and embeds each file’s path + first 2048 characters - Caching — embeddings are stored as a binary Float32Array in
.sidecar/cache/embeddings.binwith content hashes, so files are only re-embedded when they change - Querying — each user message is embedded and compared against all file vectors by cosine similarity
- Scoring — semantic similarity is blended with heuristic scoring (path matching, recency, conversation context) using a configurable weight (default 0.6)
Configuration
| Setting | Default | Description |
|---|---|---|
sidecar.enableSemanticSearch |
true |
Enable ONNX-based semantic file search |
sidecar.semanticSearchWeight |
0.6 |
Blend ratio (0 = keyword only, 1 = embeddings only) |
The model loads lazily in the background. Until it’s ready, SideCar falls back to keyword-based scoring with no impact on usability.
Doc Index: Automatic Documentation Retrieval
What It Does
The Doc Index automatically discovers and indexes your project’s documentation, then retrieves relevant sections for every user message using keyword scoring. This helps the agent understand your project’s conventions, architecture, and best practices without requiring you to manually paste documentation into every chat.
This is keyword retrieval, not embedding RAG. Queries are tokenized (split on camelCase, snake_case, whitespace, punctuation) and scored by shared token count, with headings weighted 3x over body text. No vectors, no chunking, no reranking. For semantic similarity across code files, use Semantic Search above — the two features are complementary, not redundant.
How It Works
- Discovery — On startup, SideCar crawls your workspace for documentation files:
README*files in the project root- All
.mdfiles indocs/,doc/, andwiki/directories
- Indexing — Each markdown file is parsed and indexed by:
- Headings (h1-h6) — title matches score 3x higher
- Paragraphs — body text is indexed for keyword search
- Retrieval — For every user message:
- Your message is searched against the index using keyword matching
- Relevant entries are ranked by relevance score
- Top matches are injected into the system prompt
- Context Injection — Matched documentation is injected after skill injection and before workspace context, respecting the remaining context budget
Example
Your documentation (docs/AUTHENTICATION.md):
# Authentication
## JWT Tokens
We use JWT tokens for stateless authentication. Tokens are signed with the RS256 algorithm.
- Token format: `Bearer <jwt>`
- Expiration: 24 hours
- Refresh via `/api/auth/refresh` endpoint
Your request:
“How should I implement login?”
What happens:
- SideCar searches the docs index for “login” and “authentication”
- Finds
docs/AUTHENTICATION.mdwith high relevance - The JWT token section is injected into the system prompt
- The agent now has context about your authentication scheme and can suggest appropriate code
Configuration
The Doc Index is enabled by default but fully configurable:
| Setting | Default | Description |
|---|---|---|
sidecar.enableDocumentationRAG |
true |
Enable/disable the Doc Index. Key is named ...RAG for backward compatibility with existing user configs; it controls the keyword-based index, not embedding RAG. |
sidecar.ragMaxDocEntries |
5 |
Max documentation sections per message (1-20) |
sidecar.ragUpdateIntervalMinutes |
60 |
Re-index documentation every N minutes (5-360, or 0 to disable) |
Tips
- Keep docs up-to-date: the Doc Index is only as good as your documentation. Update README and docs/ when conventions change
- Use headings: Documentation is indexed by heading level. Use clear, descriptive headings for better retrieval
- Organize by topic: Create separate files or sections for different domains (Authentication, API, Database, etc.)
- Include examples: Code examples in docs are indexed along with text, helping the agent suggest relevant patterns
Agent Memory: Persistent Learning
What It Does
Agent memory learns from your coding patterns and automatically remembers them across sessions. When the agent successfully uses a tool or follows a convention, it records that pattern. On future messages, relevant learned patterns are injected into the context to improve consistency and decision-making.
How It Works
- Recording — During agent runs, tool executions are automatically recorded:
- Successes are stored as
patternmemories with tool name and input - Failures are stored as
failurememories with error details - Tool chains — sequences of 3+ tools used together in a session are stored as
toolchainmemories (e.g.read_file → edit_file → get_diagnostics) - Context metadata is stored (timestamp, relevance category)
- Entry is persisted to
.sidecar/memory/agent-memories.json
- Successes are stored as
- Searching — For every user message:
- Your message is searched against stored patterns
- Results are ranked by relevance and use-count
recordUse()is called automatically on retrieved memories, keeping use-counts accurate- Top matches are formatted and injected into context
- Scoring — Memories have multiple importance signals:
- Use-count: Automatically incremented each time a memory is retrieved. Frequent patterns score higher
- Recency: Newer patterns are boosted in search results (linear decay over 7 days)
- Co-occurrence: Tool chain memories power
suggestNextTools(), which recommends likely next tools based on past sequences
- Eviction — When the memory store reaches its limit (default 500 entries):
- Entries with lowest combined use-count + recency score are evicted first
- Most-used and most-recent patterns are preserved
Memory Types
Memories are categorized by type to organize learning:
- Patterns — Successful tool uses, common approaches for specific tasks
- Failures — Tool executions that produced errors, helping the agent avoid repeating mistakes
- Tool chains — Sequences of tools used together successfully (e.g.
read_file → edit_file → get_diagnostics) - Decisions — Architectural choices, coding conventions, established practices
- Conventions — Project-specific naming patterns, folder structures, file organization
Example pattern:
{
"id": "mem-1234",
"type": "pattern",
"category": "tool:edit_file",
"content": "Successfully used edit_file with search/replace strategy on TypeScript files",
"context": {
"timestamp": "2026-04-09T10:30:00Z",
"useCount": 3
}
}
Persistence
Agent memory is stored as JSON in:
.sidecar/memory/agent-memories.json
The file is automatically:
- Created on first memory recording
- Loaded when SideCar starts (asynchronously)
- Updated after every new memory or use-count increment
You can safely delete this file at any time to reset learning. It will be recreated automatically.
Configuration
Agent memory is enabled by default:
| Setting | Default | Description |
|---|---|---|
sidecar.enableAgentMemory |
true |
Enable/disable agent memory |
sidecar.agentMemoryMaxEntries |
500 |
Max memories to retain (10-500) |
Tips
- Let it learn: Don’t worry about memory size — the agent will record patterns automatically as you work
- Clear if stale: If you want to reset learned patterns (e.g., after major refactoring), delete
.sidecar/memory/agent-memories.json - Review recordings: For visibility into what the agent has learned, check the JSON file directly
- Combine with the Doc Index: Agent memory works alongside the Doc Index. The index surfaces documented knowledge, memory surfaces learned patterns.
Doc Index + Semantic Search + Memory Together
The three systems work synergistically — and they’re deliberately separate so each can specialize:
- Doc Index surfaces official knowledge from your markdown documentation via keyword matching (exact term wins).
- Semantic Search surfaces relevant code files via embedding similarity (intent wins — “auth flow” finds
jwt.ts). - Agent Memory adds learned patterns from actual tool usage across prior sessions.
All three are searched and injected for every message. The agent can cross-reference documented conventions with semantically relevant code and with learned patterns from prior work.
Example Workflow
Session 1: You ask the agent to implement a user authentication service
- Doc Index retrieves
docs/AUTHENTICATION.mdby matching the word “authentication” - Semantic Search surfaces
src/auth/jwt.tsby embedding similarity even though your query doesn’t mention JWT - Agent reads both, writes the new service consistent with your existing shape
- A pattern is recorded: “Successfully used JWT for authentication in TypeScript”
Session 2: You reload VS Code and ask the agent to add login to a new service
- Doc Index retrieves the same
docs/AUTHENTICATION.md - Semantic Search retrieves the newly-written
src/auth/jwt.tsplus the session 1 example - Agent Memory retrieves the “JWT authentication” pattern
- Agent has the spec, a working example, and a learned precedent — three complementary signals
- On future messages, JWT authentication ranks higher in memory search
Troubleshooting
The Doc Index isn’t finding my documentation
- Check file locations: Documentation must be in
README*,docs/**,doc/**, orwiki/** - Check file types: Only
.mdfiles are indexed - Re-index: Set
sidecar.ragUpdateIntervalMinutesto 0 and set to desired value to force a refresh - Verify settings: Check that
sidecar.enableDocumentationRAGistrue(key name kept for backward compatibility)
Agent memory seems stale
- Reset if needed: Delete
.sidecar/memory/agent-memories.jsonto start fresh - Check enable setting: Verify
sidecar.enableAgentMemoryistrue - Watch for eviction: At 500 entries, older patterns are removed. Increase
sidecar.agentMemoryMaxEntriesif you want to retain more
Too much/too few results injected
- RAG: Adjust
sidecar.ragMaxDocEntries(default 5) to inject more or fewer documentation sections - Memory: Adjust the search in the code if needed — currently hardcoded to retrieve 5 memory entries
- Budget: Both systems respect remaining context budget. If your workspace is large, fewer RAG/memory results fit
See Also
- Configuration — Full settings reference
- Architecture — How RAG and memory integrate into the system
- Large Files & Monorepos — How streaming reads work alongside RAG