Architecture
How CodeRecon works under the hood.
Overview¶
CodeRecon is a global multi-repo analysis daemon. A single background process manages all registered repositories. AI agents connect via a single MCP endpoint (POST /mcp) and declare their target repo at session initialization. The global catalog tracks all registered repos and worktrees.

Global Daemon Architecture¶
CodeRecon uses a global multi-repo daemon: a single background process on port 7654 manages all registered repos. AI agents connect to one MCP endpoint (POST /mcp) and declare their target repo via _meta.repo in the MCP InitializeRequest. The daemon binds the session to that repo/worktree pair and routes all subsequent tool calls through a session binding table.
recon up → start the global daemon (registers current repo)
recon global-status → see all registered repos and worktrees
Per-call repo/worktree overrides allow cross-repo queries without opening a new session.
Daemon processes are managed via PID files in ~/.local/share/coderecon/.
HTTP Routes¶
The daemon exposes these operator-facing REST endpoints:
| Endpoint | Method | Description |
|---|---|---|
/health | GET | Server health + active repos |
/catalog | GET | List registered repos and worktrees |
/catalog/register | POST | Register a repo dynamically |
/catalog/unregister | POST | Unregister a repo |
/repos/{name}/health | GET | Per-repo health check |
/repos/{name}/status | GET | Per-repo status |
/repos/{name}/reindex | POST | Trigger full re-index for a repo |
/repos/{name}/refresh-worktrees | POST | Discover and activate new git worktrees |
/mcp | POST | MCP tool endpoint (agent-facing, streamable HTTP) |
Four-Tier Index¶
Every registered repo builds and maintains a four-tier index inside .recon/.

Tier 0 — Lexical (Tantivy)¶
Always on. Tantivy full-text index over all source files. Used for:
- Candidate discovery (fast text search across all tokens)
- Fallback when higher tiers are unavailable (e.g., during indexing)
- Supplemental lexical matching alongside semantic results
Stored in .recon/tantivy/. Rebuilt automatically on first recon up and kept fresh via file-watch events.
Tier 1 — Structural Facts (Tree-sitter + SQLite)¶
Parsed, structured knowledge. Each source file is parsed with Tree-sitter to extract:
| Fact type | Description |
|---|---|
DefFact | Symbol definitions (functions, classes, variables) |
RefFact | References to symbols, with tier/role |
ImportFact | Import statements and resolved targets |
ExportEntry | Exported symbols / public API surface |
ScopeFact | Lexical scope hierarchy |
LocalBindFact | Local variable bindings |
The import graph (who imports whom) drives the graph_* tools and the tiered test selection in checkpoint.
Tier 2 — Type and Semantic Facts¶
Type-level structural knowledge extracted from annotations and usage patterns:
| Fact type | Description |
|---|---|
TypeAnnotationFact | Explicit type annotations |
TypeMemberFact | Class/struct member definitions |
MemberAccessFact | Object member access chains (duck-typing inference) |
InterfaceImplFact | Interface/trait implementations |
ReceiverShapeFact | Inferred type shapes from usage patterns |
Tier 3 — Behavioral Facts¶
Runtime and cross-cutting facts linking code to tests, coverage, and external interfaces:
| Fact type | Description |
|---|---|
TestCoverageFact | Test → definition coverage links |
TestReachabilityFact | Test → target reachability (static + runtime) |
CallEdge | Materialized caller → callee edges |
LineCoverageFact | Per-line hit counts |
LintStatusFact | Lint/type-check diagnostics |
EndpointFact | HTTP/RPC endpoint definitions |
DocCrossRef | Docstring cross-references |
Search Indices¶
Additional indices for retrieval:
| Index | Description |
|---|---|
SpladeVec | SPLADE sparse embeddings for definitions |
FileChunkVec | Embeddings for non-code chunks (Markdown, YAML) |
DocCodeEdgeFact | Semantic edges linking documentation to code |
Ranking Pipeline¶
Retrieval results from the recon tool are ranked using a LightGBM LambdaMART model pipeline:

- Retrieval — multiple retrievers (lexical, term match, graph, symbol) produce a candidate pool
- Gate classification — a gate model classifies the (query, repo) pair as
OK,UNSAT,BROAD, orAMBIG - File ranking — a file ranker scores files by relevance
- Definition ranking — a definition ranker scores individual definitions within files
- Cutoff prediction — a cutoff model predicts the optimal result count
Epoch Model and Freshness¶
The index uses an epoch counter to track index generations. Each full re-index bumps the epoch. Incremental updates (triggered by file changes) are applied as delta patches without bumping the epoch.
A freshness gate blocks queries while the index is stale. When a file change is detected, the gate marks the worktree as stale; once the background indexer finishes processing, it marks the worktree fresh again. Queries wait for the fresh signal before returning results.
epoch 1 → initial index
epoch 2 → full re-index after large change
epoch 3 → ...
intra-epoch deltas: tracked as "changed files" against the current epoch
Session Model¶
Each MCP connection gets an isolated session. Sessions hold:
candidate_maps— thereconcall's result, required before refactoringmutation_ctx— pending refactor IDs awaitingrefactor_commitorrefactor_cancelread_onlyflag — set viarecon(read_only=true)to block mutations
Sessions are scoped to a single agent conversation. No session state bleeds between connections.
Refactor Engine¶
Structural refactors (rename/move) are preview-first: the engine computes all edits and assigns each hunk a certainty level before any file is touched.
Certainty levels:
high — unambiguous symbol resolution
medium — probable match, context confirmed
low — possible match, human review suggested
When low-certainty hunks are present, refactor_commit returns a verification_required flag. The agent should use refactor_commit(inspect_path=...) to review those matches before applying.
The apply step runs inside a mutation lock to prevent concurrent conflicting changes.
Lint Subsystem¶
CodeRecon auto-detects and runs linters, formatters, and type checkers. The lint subsystem supports:
| Category | Examples |
|---|---|
| Lint | ruff, eslint, pylint, rubocop, phpcs |
| Format | black, prettier, gofmt, cargo fmt |
| Type check | mypy, tsc, go vet, cargo clippy |
| Security | (via linter security rules) |
Lint runs are triggered by checkpoint and can auto-fix issues before tests run.
Testing Subsystem¶
Test discovery and execution is handled by Runner Packs — language-specific plugins that detect, discover, run, and parse tests. Tests are not exposed as separate MCP tools; they are integrated into checkpoint.
Test selection uses the import graph:
| Hop | Description |
|---|---|
| 0 | Test files that directly import a changed file |
| 1 | Test files that import files that import a changed file |
| N | Further transitive dependencies |
By default, only hop-0 tests run. If hop-0 passes, hop-1 runs. If hop-0 fails, transitive hops are skipped. When a commit_message is provided, hop depth auto-escalates to 2 for broader coverage.
Worktree Support¶
CodeRecon supports Git worktrees. Each worktree is registered separately and gets its own entry in the global catalog. The index is shared (read-only) across worktrees; only the worktree-specific delta is tracked independently.
recon register-worktree # register current worktree
recon worktrees # list all worktrees for this repo
Cross-Filesystem Detection (WSL)¶
When a repo is on a Windows filesystem path (e.g. /mnt/c/...), cross-filesystem SQLite I/O would be prohibitively slow. CodeRecon detects this automatically at register time and moves the .recon/ index to a native Linux path: ~/.local/share/coderecon/indices/<repo-hash>/. This is transparent to the agent.
Containerized Environments¶
CodeRecon runs as a local Python process and inspects the host filesystem directly. When a project's runtime, dependencies, or configuration are abstracted away inside a Docker container or dev container, CodeRecon has no visibility into that environment.
Functionality that may be degraded:
- Dependency resolution — installed packages are read from the host Python environment, not the container's.
- Framework and runtime detection — version checks and feature detection rely on what is available locally.
- Test discovery — test runners and their plugins must be installed on the host for CodeRecon to detect and enumerate test targets.
- Language server / SCIP indexing — indexers run on the host and resolve imports against host-installed packages.
If your development workflow is fully containerized, consider running CodeRecon inside the container where the project environment is materialized, or ensure the host has a matching virtual environment.
File Watching and Delta Indexing¶
Once recon up starts the daemon, a file watcher monitors the repository for changes. When files are saved:
- Changed files are queued for re-parsing
- Tier 1–3 deltas are applied to SQLite
- Tantivy index is updated incrementally
This keeps the index fresh without requiring a full re-index between edits.
SDK¶
CodeRecon provides an async Python SDK for programmatic access. The SDK spawns the daemon over stdio (no port allocation needed) and exposes all MCP tools as typed async methods:
from coderecon.sdk import CodeRecon
async with CodeRecon() as cr:
await cr.register("/path/to/repo")
result = await cr.recon("my-repo", task="find auth logic")
The SDK also provides framework adapters:
as_openai_tools()— convert tools to OpenAI function-calling schemaas_langchain_tools()— convert tools to LangChain tool schema
See the SDK specification for wire protocol details.