If you’ve ever typed grep -ri "where do we handle" you already know the punchline. grep is a precision instrument for finding strings. The thing you actually wanted to find is a concept — and the file that owns it might not contain the word you guessed.

This article is about the alternative. Not “AI autocomplete”, not a chatbot — a search engine, that happens to use embeddings, that runs on your laptop, and that you can call from your editor, your CLI, or your AI agent.

Why grep gets you 70% of the way and then stops

grep and its modern descendant ripgrep are built around exact, literal matching. They’re fast because they don’t try to be clever. When you know the function name — RefreshToken, OAuth2Provider, parseJWT — they’re unbeatable.

The break appears the moment you don’t know the exact name. Three failures keep recurring:

  • Synonyms. You search for retry; the file calls it backoff.
  • Concept drift. You’re hunting “where do we sign JWTs”; the answer lives in internal/auth/keys.go under the function MintAccessToken. No occurrence of the word “JWT”.
  • Cross-file context. The behavior you care about is a flow: handler reads config, calls a service, which calls another. Grep finds you any one of those by keyword but not the connection between them.

Semantic search — vector embeddings of code and prose — was designed to handle exactly those cases. But pure semantic search has its own failure mode: it loses to grep on exact identifiers, because a substring like OAuth2Provider is uninformative as a sentence.

The right answer is to run both, and rank the results together. That’s what StrataFS does.

Hybrid search, in one query

StrataFS keeps three signals per chunk:

  1. BM25 from SQLite’s FTS5 — how well does this chunk match the literal tokens you typed?
  2. Vector cosine similarity from a local ONNX embedding — how semantically close is this chunk’s meaning to your query?
  3. Metadata — how recently was the file edited; does the filename match the query; what’s the file type?

A single SQL query fuses them with configurable weights. You can toggle to FTS-only when you know the exact phrase (mode=fts), or to vector-only when you want to explore neighbours (mode=vector). The default mode=hybrid is what most queries want.

stratafs search "where do we handle JWT refresh" --mode hybrid --limit 5

Real result from a real auth-heavy codebase:

1. auth/middleware/refresh.go:18      score=0.86
   // RefreshToken validates the refresh JWT and issues a new access token.

2. docs/runbooks/auth-rotation.md     score=0.78
   When the access token expires we exchange the long-lived refresh token at /v2/auth/refresh.

3. internal/session/jwt.go:42         score=0.69
   // JWT signing keys are rotated daily. Refresh tokens persist 30 days.

That’s three different file types — code, runbook, code — surfaced from one query. Position one because the filename, the comment, and the semantic intent all align.

Pure vector search would miss “refresh.go” because the filename isn’t a sentence. Pure BM25 would miss the rotation docs whose vocabulary doesn’t match yours. Hybrid finds both.

The “I’m new to this codebase” workflow

Three queries I run on day one of any new repo:

  • stratafs search "what's the entry point" --mode hybrid
  • stratafs search "where do we authenticate users" --path "**/*.go"
  • stratafs search "rate limit configuration" --since 90d

The third one matters because semantic search is also a recency filter. “What’s been touched recently in the rate-limit area?” is a question grep can’t answer. StrataFS can, because the metadata score includes a recency component.

Adding it to your editor

The REST API is your friend:

// pseudo VS Code extension command
async function semanticGoto(query: string) {
  const res = await fetch(`http://localhost:8080/search?q=${encodeURIComponent(query)}&limit=10`);
  const { results } = await res.json();
  const picked = await vscode.window.showQuickPick(results.map(r => ({
    label: `${r.path}:${r.line}`,
    description: r.snippet,
    detail: `score=${r.score.toFixed(2)} · ${r.source}`,
    target: r,
  })));
  if (picked) {
    await vscode.window.showTextDocument(
      vscode.Uri.file(picked.target.path),
      { selection: new vscode.Range(picked.target.line, 0, picked.target.line, 0) },
    );
  }
}

A working wrapper is ~50 lines per editor. For Neovim, a Telescope picker over /search is a saner default than fzf-over-rg for natural-language queries.

Hand it to your AI assistant

The whole reason hybrid search matters more now than in 2022 is the agent. Claude Desktop, Cursor, Continue, custom MCP clients — they all want to call into “the codebase” as a tool. StrataFS exposes its search engine as a native Model Context Protocol server on port 8081. One config entry, your agent gets stratafs.search, and it stops hallucinating file paths.

See our Claude Desktop walkthrough for the five-minute setup.

What about Sourcegraph / Cody / Aider / Cursor?

Different tools for related problems. StrataFS is a search layer, not an IDE feature. The right comparison is to Elasticsearch (powerful but heavy) or Pinecone (vector only, SaaS), not to a code-assist UI. We have a side-by-side comparison page if you want the detail.

What grep is still better at

Be honest about scope. Grep wins for:

  • Exact identifier renames (the kind of search that ends in a refactor).
  • One-shot questions you’d otherwise pipe to xargs.
  • Million-file monorepos where you want a streaming match, not a ranking.
  • Sub-50ms latency expectations on every keystroke (StrataFS is ~100ms warm).

If your daily flow is “I know the symbol”, keep grep. The argument here is that “I know the concept” deserves an answer too — and that answer doesn’t have to come from a SaaS vendor.

Try it

npm install -g stratafs
stratafs config init
stratafs sources add code --type local --path "$HOME/work"
stratafs serve &
stratafs search "your concept here" --mode hybrid

You’ll know in five minutes if it’s for you. The index lives in ~/.stratafs/; throw it away if you don’t want it. MIT licensed, no telemetry, runs offline. Your code stays where it is.