Doc Index, Semantic Search & Agent Memory

SideCar uses three retrieval systems to improve accuracy and consistency:

Doc Index — keyword-tokenized paragraph index over README / docs/ / wiki/. Fast, cheap, and tuned for human-written prose where exact term matches win.
Semantic Search — ONNX all-MiniLM-L6-v2 embeddings over workspace files with cosine similarity. Tuned for code where embeddings match intent across files that share no keywords.
Agent Memory — persistent pattern store that learns from successful tool invocations and injects relevant memories into future turns.

Note on naming: earlier docs called the Doc Index “RAG”, which was misleading — it’s a keyword paragraph index, not a retrieval-augmented-generation pipeline with embeddings, chunking, and reranking. The Semantic Search feature (below) uses real embeddings over code. A future retriever-fusion layer will merge results from both sources with reciprocal-rank scoring instead of concatenating them.

Semantic Search

SideCar embeds your workspace files using a local ONNX model (all-MiniLM-L6-v2, 384-dimensional) and searches by cosine similarity. This means a query like “authentication logic” finds src/auth/jwt.ts even when there’s no keyword match in the file path or conversation history.

How it works

Indexing — after the workspace index is built, SideCar downloads the embedding model (~23MB, cached in .sidecar/cache/models/) and embeds each file’s path + first 2048 characters
Caching — embeddings are stored as a binary Float32Array in .sidecar/cache/embeddings.bin with content hashes, so files are only re-embedded when they change
Querying — each user message is embedded and compared against all file vectors by cosine similarity
Scoring — semantic similarity is blended with heuristic scoring (path matching, recency, conversation context) using a configurable weight (default 0.6)

Configuration

Setting	Default	Description
`sidecar.enableSemanticSearch`	`true`	Enable ONNX-based semantic file search
`sidecar.semanticSearchWeight`	`0.6`	Blend ratio (0 = keyword only, 1 = embeddings only)

The model loads lazily in the background. Until it’s ready, SideCar falls back to keyword-based scoring with no impact on usability.

Doc Index: Automatic Documentation Retrieval

What It Does

The Doc Index automatically discovers and indexes your project’s documentation, then retrieves relevant sections for every user message using keyword scoring. This helps the agent understand your project’s conventions, architecture, and best practices without requiring you to manually paste documentation into every chat.

This is keyword retrieval, not embedding RAG. Queries are tokenized (split on camelCase, snake_case, whitespace, punctuation) and scored by shared token count, with headings weighted 3x over body text. No vectors, no chunking, no reranking. For semantic similarity across code files, use Semantic Search above — the two features are complementary, not redundant.

How It Works

Discovery — On startup, SideCar crawls your workspace for documentation files:
- README* files in the project root
- All .md files in docs/, doc/, and wiki/ directories
Indexing — Each markdown file is parsed and indexed by:
- Headings (h1-h6) — title matches score 3x higher
- Paragraphs — body text is indexed for keyword search
Retrieval — For every user message:
- Your message is searched against the index using keyword matching
- Relevant entries are ranked by relevance score
- Top matches are injected into the system prompt
Context Injection — Matched documentation is injected after skill injection and before workspace context, respecting the remaining context budget

Example

Your documentation (docs/AUTHENTICATION.md):

# Authentication

## JWT Tokens

We use JWT tokens for stateless authentication. Tokens are signed with the RS256 algorithm.

- Token format: `Bearer <jwt>`
- Expiration: 24 hours
- Refresh via `/api/auth/refresh` endpoint

Your request:

“How should I implement login?”

What happens:

SideCar searches the docs index for “login” and “authentication”
Finds docs/AUTHENTICATION.md with high relevance
The JWT token section is injected into the system prompt
The agent now has context about your authentication scheme and can suggest appropriate code

Configuration

The Doc Index is enabled by default but fully configurable:

Setting	Default	Description
`sidecar.enableDocumentationRAG`	`true`	Enable/disable the Doc Index. Key is named `...RAG` for backward compatibility with existing user configs; it controls the keyword-based index, not embedding RAG.
`sidecar.ragMaxDocEntries`	`5`	Max documentation sections per message (1-20)
`sidecar.ragUpdateIntervalMinutes`	`60`	Re-index documentation every N minutes (5-360, or 0 to disable)

Tips

Keep docs up-to-date: the Doc Index is only as good as your documentation. Update README and docs/ when conventions change
Use headings: Documentation is indexed by heading level. Use clear, descriptive headings for better retrieval
Organize by topic: Create separate files or sections for different domains (Authentication, API, Database, etc.)
Include examples: Code examples in docs are indexed along with text, helping the agent suggest relevant patterns

Agent Memory: Persistent Learning

What It Does

Agent memory learns from your coding patterns and automatically remembers them across sessions. When the agent successfully uses a tool or follows a convention, it records that pattern. On future messages, relevant learned patterns are injected into the context to improve consistency and decision-making.

How It Works

Recording — During agent runs, tool executions are automatically recorded:
- Successes are stored as pattern memories with tool name and input
- Failures are stored as failure memories with error details
- Tool chains — sequences of 3+ tools used together in a session are stored as toolchain memories (e.g. read_file → edit_file → get_diagnostics)
- Context metadata is stored (timestamp, relevance category)
- Entry is persisted to .sidecar/memory/agent-memories.json
Searching — For every user message:
- Your message is searched against stored patterns
- Results are ranked by relevance and use-count
- recordUse() is called automatically on retrieved memories, keeping use-counts accurate
- Top matches are formatted and injected into context
Scoring — Memories have multiple importance signals:
- Use-count: Automatically incremented each time a memory is retrieved. Frequent patterns score higher
- Recency: Newer patterns are boosted in search results (linear decay over 7 days)
- Co-occurrence: Tool chain memories power suggestNextTools(), which recommends likely next tools based on past sequences
Eviction — When the memory store reaches its limit (default 500 entries):
- Entries with lowest combined use-count + recency score are evicted first
- Most-used and most-recent patterns are preserved

Memory Types

Memories are categorized by type to organize learning:

Patterns — Successful tool uses, common approaches for specific tasks
Failures — Tool executions that produced errors, helping the agent avoid repeating mistakes
Tool chains — Sequences of tools used together successfully (e.g. read_file → edit_file → get_diagnostics)
Decisions — Architectural choices, coding conventions, established practices
Conventions — Project-specific naming patterns, folder structures, file organization

Example pattern:

{
  "id": "mem-1234",
  "type": "pattern",
  "category": "tool:edit_file",
  "content": "Successfully used edit_file with search/replace strategy on TypeScript files",
  "context": {
    "timestamp": "2026-04-09T10:30:00Z",
    "useCount": 3
  }
}

Persistence

Agent memory is stored as JSON in:

.sidecar/memory/agent-memories.json

The file is automatically:

Created on first memory recording
Loaded when SideCar starts (asynchronously)
Updated after every new memory or use-count increment

You can safely delete this file at any time to reset learning. It will be recreated automatically.

Configuration

Agent memory is enabled by default:

Setting	Default	Description
`sidecar.enableAgentMemory`	`true`	Enable/disable agent memory
`sidecar.agentMemoryMaxEntries`	`500`	Max memories to retain (10-500)

Tips

Let it learn: Don’t worry about memory size — the agent will record patterns automatically as you work
Clear if stale: If you want to reset learned patterns (e.g., after major refactoring), delete .sidecar/memory/agent-memories.json
Review recordings: For visibility into what the agent has learned, check the JSON file directly
Combine with the Doc Index: Agent memory works alongside the Doc Index. The index surfaces documented knowledge, memory surfaces learned patterns.

Doc Index + Semantic Search + Memory Together

The three systems work synergistically — and they’re deliberately separate so each can specialize:

Doc Index surfaces official knowledge from your markdown documentation via keyword matching (exact term wins).
Semantic Search surfaces relevant code files via embedding similarity (intent wins — “auth flow” finds jwt.ts).
Agent Memory adds learned patterns from actual tool usage across prior sessions.

All three are searched and injected for every message. The agent can cross-reference documented conventions with semantically relevant code and with learned patterns from prior work.

Example Workflow

Session 1: You ask the agent to implement a user authentication service

Doc Index retrieves docs/AUTHENTICATION.md by matching the word “authentication”
Semantic Search surfaces src/auth/jwt.ts by embedding similarity even though your query doesn’t mention JWT
Agent reads both, writes the new service consistent with your existing shape
A pattern is recorded: “Successfully used JWT for authentication in TypeScript”

Session 2: You reload VS Code and ask the agent to add login to a new service

Doc Index retrieves the same docs/AUTHENTICATION.md
Semantic Search retrieves the newly-written src/auth/jwt.ts plus the session 1 example
Agent Memory retrieves the “JWT authentication” pattern
Agent has the spec, a working example, and a learned precedent — three complementary signals
On future messages, JWT authentication ranks higher in memory search

Troubleshooting

The Doc Index isn’t finding my documentation

Check file locations: Documentation must be in README*, docs/**, doc/**, or wiki/**
Check file types: Only .md files are indexed
Re-index: Set sidecar.ragUpdateIntervalMinutes to 0 and set to desired value to force a refresh
Verify settings: Check that sidecar.enableDocumentationRAG is true (key name kept for backward compatibility)

Agent memory seems stale

Reset if needed: Delete .sidecar/memory/agent-memories.json to start fresh
Check enable setting: Verify sidecar.enableAgentMemory is true
Watch for eviction: At 500 entries, older patterns are removed. Increase sidecar.agentMemoryMaxEntries if you want to retain more

Too much/too few results injected

RAG: Adjust sidecar.ragMaxDocEntries (default 5) to inject more or fewer documentation sections
Memory: Adjust the search in the code if needed — currently hardcoded to retrieve 5 memory entries
Budget: Both systems respect remaining context budget. If your workspace is large, fewer RAG/memory results fit

RAG & Memory

Doc Index, Semantic Search & Agent Memory

Semantic Search

How it works

Configuration

Doc Index: Automatic Documentation Retrieval

What It Does

How It Works

Example

Configuration

Tips

Agent Memory: Persistent Learning

What It Does

How It Works

Memory Types

Persistence

Configuration

Tips

Doc Index + Semantic Search + Memory Together

Example Workflow

Troubleshooting

The Doc Index isn’t finding my documentation

Agent memory seems stale

Too much/too few results injected

See Also