A retrieval layer that knows it's talking to an LLM.
RAG, but with the retrieval part actually done well. Hybrid search, pre-shaped responses, native MCP, and a stack that runs on the same laptop as your agent.
{
"jsonrpc": "2.0",
"id": 7,
"method": "tools/call",
"params": {
"name": "stratafs.search",
"arguments": {
"query": "where do we handle JWT refresh",
"sources": [
"code",
"docs"
],
"mode": "hybrid",
"limit": 3
}
}
} MCP tools/call over JSON-RPC 2.0 — exactly what Claude Desktop, ChatGPT plugins, and custom agents speak.
The "we just need RAG" problem
Every LLM app needs retrieval. Most teams end up gluing together: a crawler, a chunker, an embedding API, a vector DB, a search service, an agent-side tool wrapper. Each is fine alone, but the joints are where reliability goes to die — drift between chunkers, embeddings re-running after a model swap, the vector DB drifting from the source of truth.
StrataFS collapses that stack to one process. The chunker, the embedder, the index, and the agent tool live in the same binary, share the same schema, and update atomically when a file changes.
Default integration: drop one URL into your MCP client
config and the agent gets stratafs.search,
stratafs.get_chunk, stratafs.list_sources, and
stratafs.stat automatically.
Why hybrid retrieval, specifically
Pure vector search misses on exact identifiers ("the
OAuth2RefreshGrant function"). Pure BM25 misses on intent
("how do we exchange a stale token"). Hybrid gets both. StrataFS does
this in one SQL query, so latency stays under 100 ms.
You can also force a mode per query: mode=fts when you know
the exact phrase, mode=vector when you want the model to
explore semantic neighbors. See the search engine
page for the query shape.
Pre-shaped responses
A search server that returns raw rows forces every agent to reshape and
truncate. StrataFS returns { type: "text", text: "<path>:<line> — <snippet>" }
blocks pre-trimmed to a sensible context budget. Less prompt engineering
per integration, fewer tokens burned on JSON keys.
What you can build on top
- Code Q&A agents — your agent answers "where do we handle X" by reading actual repository code.
- Documentation copilots — index Drive, SharePoint, Confluence; agent cites real internal docs.
- Ops/incident bots — index runbooks + Jira + dashboards; agent surfaces the right runbook by symptom, not by keyword.
- Internal research — index Slack exports + product spec PDFs + Drive; let the model summarize across them.
It runs where your agent runs
The default 127.0.0.1 bind means StrataFS lives on the same
box as your agent. For server-side agents, run StrataFS in a sidecar
container — same Docker image, same MCP URL, just a different host.
Cost model
Zero. The embedding model runs locally (ONNX Runtime), the search engine is SQLite, and the binary is MIT-licensed. The only marginal cost is disk: roughly 1.5–2× the indexed text size, with compression on. No API bill that scales with token volume.
Give your agent retrieval that doesn't suck.
MCP-native. Hybrid. Local. MIT licensed.