The Model Context Protocol — Anthropic’s late-2024 spec, now broadly adopted — solved a real coordination problem. Before MCP, every agent integration was bespoke glue: my tool descriptions, your tool dispatch, their argument schema. After MCP, an agent points at a URL and gets a tool catalog it knows how to call.
But “speaks MCP” is just the floor. A good knowledge-base server has to make a thousand small decisions about chunking, ranking, response shape, and isolation — decisions that determine whether the agent actually answers questions correctly or hallucinates with confidence.
This is the long form of how we made those decisions in StrataFS.
What an MCP server actually does
MCP is JSON-RPC 2.0 with three required methods:
tools/list— return the catalog (name, description, input schema for each).tools/call— invoke a tool by name, return content blocks.resources/*— list and read resources (optional but recommended).
The wire format is dull, which is exactly the point. Once a client speaks MCP, it can talk to any server.
StrataFS exposes four tools:
[
{ "name": "stratafs.search", /* hybrid search across sources */ },
{ "name": "stratafs.get_chunk", /* fetch one chunk by id */ },
{ "name": "stratafs.list_sources", /* enumerate sources + sync state */ },
{ "name": "stratafs.stat", /* health, queue depth */ }
]
Four feels small. It is. Most knowledge-base servers either have too many tools (one per file type, one per source — exploding the agent’s context window with tool descriptions) or too few (just a single search that returns IDs the agent then has to fetch one at a time). Four is the right number: one for retrieval, one for hydration, one for situational awareness, one for health.
Pre-shaped responses are the difference
A naive search server returns rows:
{ "results": [
{ "id": 1492, "path": "auth/refresh.go", "line": 18, "content": "..." },
{ "id": 1493, "path": "docs/runbooks/auth-rotation.md", "line": 0, "content": "..." }
]}
The agent now has to:
- Parse it.
- Reshape it for the model.
- Truncate it.
- Decide which IDs to expand.
Most agent codebases do this slightly wrong, every time. StrataFS does it once, in the server:
{ "content": [
{ "type": "text", "text": "auth/middleware/refresh.go:18 — RefreshToken(ctx, raw) validates the refresh JWT and issues a new access token..." },
{ "type": "text", "text": "docs/runbooks/auth-rotation.md — sequence diagram for /v2/auth/refresh; access tokens are 15-minute lived..." }
]}
The agent gets text, already trimmed to a sensible context budget, already attributed by path. No JSON ceremony in the prompt. No “now decide which result to expand” loop. Every saved token is a token spent on reasoning.
The bottleneck in retrieval is rarely network or compute. It’s context window. A server that pre-shapes its responses pays for itself in every single agent turn.
Hybrid retrieval for agents, specifically
LLMs alternate between two query modes within a single conversation:
- Identifier queries: “find the
OAuth2Providerclass”, “show meparseJWT”. - Intent queries: “where do we refresh tokens”, “how does auth rotation work”.
A vector-only retriever loses the first; the embedding of OAuth2Provider as a sentence is meaningless. A BM25-only retriever loses the second; the file with the answer might not contain the word “refresh”. Agents that use mode-only retrievers either give up on the queries they can’t serve, or fall back to listing files (which they then read serially, which is slow and noisy).
StrataFS gives the agent one tool that handles both. The mode argument defaults to hybrid; the agent can override with mode=fts when it knows a literal, or mode=vector when it wants to explore. Most agents never set it.
Per-source isolation matters more than you’d think
Each StrataFS source — a local directory, an S3 bucket, a Drive folder — gets its own SQLite database. The agent sees them as a unified search but the storage is segmented. Two consequences:
- Permission failures don’t poison the index. If your Drive token expires, the Drive source goes stale; the local code source is unaffected. The
stratafs.list_sourcestool tells the agent what’s stale, so the agent can warn the user instead of confidently citing 6-week-old docs. - Source-level scoping is free. An agent can call
stratafs.searchwithsources: ["code", "docs"]to exclude Jira; the SQL just doesn’t open the Jira database. No filters applied after-the-fact, no read-amplification.
What “MCP-native” actually means
A surprising number of “MCP integrations” are HTTP proxies in front of an existing search service. They speak MCP at the edge and REST internally; every call does double-marshalling, double-pagination, double-context-trimming.
StrataFS is built MCP-first. The Go MCP package, the result shapers, the source listing — all written for the agent’s needs, not the human’s. The REST API is the same engine exposed for non-agent consumers; it shares the same shapers and is therefore equally agent-friendly when you want one.
The practical upshot: tool calls are cheap. A single stratafs.search typically completes in 80–150 ms warm, including the agent-side round trip. That’s fast enough for an agent to do multiple searches in a single turn — which is the right pattern when the first query is exploratory.
Wiring it into Claude Desktop
The configuration is one stanza:
{
"mcpServers": {
"stratafs": {
"url": "http://localhost:8081/mcp"
}
}
}
Restart Claude. The next time you ask “What does the auth/middleware/refresh.go file do?”, Claude calls stratafs.search, reads the trimmed snippets, and answers from your actual code. Walkthrough with screenshots: Claude Desktop + StrataFS setup.
Wiring it into custom agents
If you’re building your own agent — LangChain, Vercel AI SDK, Anthropic SDK with tool-use, whatever — the Python MCP client is two lines:
from mcp import Client
async with Client("http://localhost:8081/mcp") as c:
result = await c.call_tool(
"stratafs.search",
{"query": "rate limit configuration", "limit": 5},
)
for block in result.content:
print(block.text)
Translate to TypeScript by swapping the SDK. The protocol is the same.
Failure modes you should know about
Three things to watch when running a retrieval-backed agent:
- Stale indexes. If a file moved 90 seconds ago and the cloud source polls every 5 minutes, the agent might cite a stale chunk. StrataFS’s
stratafs.statreturns the per-source last-sync timestamp; surface it in the agent’s system prompt: “Note: source X last synced 4 minutes ago.” - Ambiguous queries. “Where’s the rate limit” can mean “configuration” or “implementation”. Agents that don’t disambiguate burn turns. Train the agent to do two parallel calls:
mode=ftsfor the literal phrase,mode=vectorfor the concept. StrataFS handles concurrent calls fine. - Token blow-up. Even with pre-shaping, an agent that calls
stratafs.searchwithlimit=20is asking for trouble. Cap the limit in the agent’s system prompt, not in the server — the right number depends on the task.
What’s next
We’re shipping streaming results (Transfer-Encoding: chunked) next, so an agent can start reading the first match while the rest are still being scored. RBAC with source-level permissions is the release after that — important for shared deployments where not every agent should see every source.
If you’re building agents that need to understand a real knowledge base — not a toy demo — try StrataFS. Install in 30 seconds. Read the MCP integration page for the tool catalog. Or jump straight to the Claude Desktop walkthrough.
The protocol is standardized. The hard part is what the server does behind the protocol. That’s where StrataFS spends its time.