For AI engineers

A retrieval layer that knows it's talking to an LLM.

RAG, but with the retrieval part actually done well. Hybrid search, pre-shaped responses, native MCP, and a stack that runs on the same laptop as your agent.

{
  "jsonrpc": "2.0",
  "id": 7,
  "method": "tools/call",
  "params": {
    "name": "stratafs.search",
    "arguments": {
      "query": "where do we handle JWT refresh",
      "sources": [
        "code",
        "docs"
      ],
      "mode": "hybrid",
      "limit": 3
    }
  }
}

MCP tools/call over JSON-RPC 2.0 — exactly what Claude Desktop, ChatGPT plugins, and custom agents speak.

The "we just need RAG" problem

Every LLM app needs retrieval. Most teams end up gluing together: a crawler, a chunker, an embedding API, a vector DB, a search service, an agent-side tool wrapper. Each is fine alone, but the joints are where reliability goes to die — drift between chunkers, embeddings re-running after a model swap, the vector DB drifting from the source of truth.

StrataFS collapses that stack to one process. The chunker, the embedder, the index, and the agent tool live in the same binary, share the same schema, and update atomically when a file changes.

Default integration: drop one URL into your MCP client config and the agent gets stratafs.search, stratafs.get_chunk, stratafs.list_sources, and stratafs.stat automatically.

Why hybrid retrieval, specifically

Pure vector search misses on exact identifiers ("the OAuth2RefreshGrant function"). Pure BM25 misses on intent ("how do we exchange a stale token"). Hybrid gets both. StrataFS does this in one SQL query, so latency stays under 100 ms.

You can also force a mode per query: mode=fts when you know the exact phrase, mode=vector when you want the model to explore semantic neighbors. See the search engine page for the query shape.

Pre-shaped responses

A search server that returns raw rows forces every agent to reshape and truncate. StrataFS returns { type: "text", text: "<path>:<line> — <snippet>" } blocks pre-trimmed to a sensible context budget. Less prompt engineering per integration, fewer tokens burned on JSON keys.

What you can build on top

Code Q&A agents — your agent answers "where do we handle X" by reading actual repository code.
Documentation copilots — index Drive, SharePoint, Confluence; agent cites real internal docs.
Ops/incident bots — index runbooks + Jira + dashboards; agent surfaces the right runbook by symptom, not by keyword.
Internal research — index Slack exports + product spec PDFs + Drive; let the model summarize across them.

It runs where your agent runs

The default 127.0.0.1 bind means StrataFS lives on the same box as your agent. For server-side agents, run StrataFS in a sidecar container — same Docker image, same MCP URL, just a different host.

Cost model

Zero. The embedding model runs locally (ONNX Runtime), the search engine is SQLite, and the binary is MIT-licensed. The only marginal cost is disk: roughly 1.5–2× the indexed text size, with compression on. No API bill that scales with token volume.

Give your agent retrieval that doesn't suck.

MCP-native. Hybrid. Local. MIT licensed.

Read the MCP integration → Article: MCP for knowledge bases