Hybrid search has become a buzzword. Vendors slap “hybrid” on their pages without explaining what’s being hybridized, why, or how to tune it. This article is the long version: the failure modes that motivated hybrid search, the math that combines the signals, the SQL that runs in StrataFS, and the knobs that matter.

Two retrievers, two failure modes

BM25 is the OG full-text ranker. It scores documents based on term frequency (how often does my query word appear in this doc) and inverse document frequency (how rare is that word across the corpus). It’s been the workhorse of search engines since 1994 because it’s fast, interpretable, and right for most keyword queries.

BM25 fails when the user’s vocabulary doesn’t match the document’s vocabulary. Classic example: query “car insurance”; the document says “vehicle policy”. Zero overlap → zero score → not retrieved. Synonym dictionaries patch this, slowly and incompletely.

Vector retrieval embeds both the query and the documents into a high-dimensional space (typically 384 or 768 dimensions for code/text), then ranks by cosine similarity. Documents whose embeddings are close in vector space score high. “Car insurance” and “vehicle policy” embed near each other, problem solved.

Vector fails on the inverse case. Exact identifiers — OAuth2Provider, IT_ERR_42, kubectl get pods — are uninformative as sentences. The embedding of OAuth2Provider doesn’t say “this is the OAuth2Provider class”; it says “this is some compound technical noun”. Two unrelated identifiers can embed close together. Vector-only search for OAuth2Provider will reliably surface OAuthClientProvider, Auth0Provider, and other near-misses.

The honest summary: BM25 is precise but literal; vectors are flexible but blurry. Most queries need both.

What hybridization means, exactly

You can hybridize in two places:

  1. At the rank level (RRF): Run both retrievers, get two ranked lists, fuse the ranks using reciprocal rank fusion: score(d) = Σ 1 / (k + rank_i(d)). Robust because it doesn’t care about score magnitudes — only positions.
  2. At the score level (weighted fusion): Normalize each retriever’s score to [0,1], compute a weighted sum: score(d) = w_fts * bm25(d) + w_vec * cosine(d) + w_meta * metadata(d). Less robust without tuning; more controllable with it.

StrataFS uses weighted fusion with an additional metadata signal. The reasoning:

  • We already need normalization for the metadata score (recency, filename match, file-type bonus) — adding BM25 and vector to the same scale costs nothing.
  • Code+docs corpora are heterogeneous; an operator who knows their corpus (say, “filenames are very informative here”) should be able to dial that in.
  • RRF discards score confidence, which is information we want to keep — a hybrid result with both signals strong is better than one with only one signal strong, even if the ranks happen to align.

The SQL, in real life

Here’s the conceptual query StrataFS builds:

WITH
  -- FTS5 BM25 from SQLite's full-text index.
  fts AS (
    SELECT chunk_id, bm25(file_chunks_fts) AS raw_score
    FROM file_chunks_fts
    WHERE file_chunks_fts MATCH :query
    LIMIT 200
  ),

  -- Vector cosine from sqlite-vec, k=50 nearest.
  vec AS (
    SELECT chunk_id, 1.0 - distance AS raw_score
    FROM file_chunks_vec
    WHERE embedding MATCH :query_embedding AND k = 50
  ),

  -- Metadata: recency + filename match + file-type bonus.
  meta AS (
    SELECT id AS chunk_id,
      recency_score(updated_at) * :w_recency +
      filename_match(:query, path)     * :w_filename +
      filetype_bonus(content_type)     * :w_filetype AS raw_score
    FROM file_chunks
    WHERE path GLOB :path_filter
  ),

  -- Min-max normalize each retriever to [0, 1].
  fts_n  AS (SELECT chunk_id, (raw_score - MIN_FTS)  / (MAX_FTS  - MIN_FTS  + 1e-9) AS score FROM fts),
  vec_n  AS (SELECT chunk_id, (raw_score - MIN_VEC)  / (MAX_VEC  - MIN_VEC  + 1e-9) AS score FROM vec),
  meta_n AS (SELECT chunk_id, (raw_score - MIN_META) / (MAX_META - MIN_META + 1e-9) AS score FROM meta)

SELECT c.*, w.total
FROM file_chunks c
JOIN (
  SELECT chunk_id,
    COALESCE(fts_n.score,  0) * :w_fts  +
    COALESCE(vec_n.score,  0) * :w_vec  +
    COALESCE(meta_n.score, 0) * :w_meta AS total
  FROM fts_n
    FULL OUTER JOIN vec_n  USING (chunk_id)
    FULL OUTER JOIN meta_n USING (chunk_id)
) w ON c.id = w.chunk_id
ORDER BY w.total DESC
LIMIT :limit;

A few notes:

  • The FTS limit of 200 and the vector k of 50 are over-fetches. We need more candidates than we’ll return because the re-ranking step might promote a chunk that BM25 placed at rank 80 but the metadata + vector signals lift to rank 3.
  • MIN_FTS, MAX_FTS, etc. are subqueries collapsed for readability. In the real query they’re computed in-line.
  • The FULL OUTER JOIN means a chunk that’s in only one of the three retrievers still gets a score (just with two zeros). This is critical: vector-only retrieval still has to surface results when BM25 misses entirely.

The weights you should care about

Three weights matter:

  • w_fts (default 0.35): how much BM25 contributes.
  • w_vec (default 0.50): how much vector similarity contributes.
  • w_meta (default 0.15): how much metadata (recency, filename, file-type) contributes.

If you tune one number, tune w_vec. The default of 0.50 is good for code+docs corpora. Pure prose corpora benefit from raising it to 0.65 — the vocabulary mismatch is more severe in natural language than in code (which is full of identifiers BM25 nails). Logs and API-heavy corpora benefit from lowering it to 0.35; identifiers carry too much signal to dilute.

The metadata weight is the most underrated. A w_meta of 0.15 means recency, filename, and file-type each contribute around 5% to the final score — small but enough to break ties when BM25 and vector both surface multiple candidates.

When weighted fusion goes wrong

Two failure modes to know about:

Score blow-out. If your corpus has wildly skewed BM25 distributions (most queries return one document with score 18 and a long tail with score 1), normalization clusters everything near the boundaries. The fix is score capping — clip BM25 raw scores at the 95th percentile before normalizing. StrataFS does this by default.

Vector domination. If your embedding model was trained on a domain very close to your corpus (say, code-specific embeddings on a Go codebase), vector scores will be uniformly high. Hybrid then degenerates to vector-only. The fix is score-floor — subtract the median raw score before normalizing, so the dynamic range is preserved.

Mode-only retrieval is still useful

StrataFS exposes mode=fts and mode=vector flags for the cases where you want one retriever exclusively. Two situations come up:

  1. You know the exact phrase. Searching for a configuration key or an error code — use mode=fts. The vector signal is noise.
  2. You want to explore semantic neighbours. “What else looks like this rate-limiting code” — use mode=vector. BM25 will anchor you back to literal matches and you’ll miss the interesting neighbours.

For everything else, leave it on hybrid. It’s the default for a reason.

SQLite is the unsung hero here

The reason all of this is feasible in a single SQL query is that FTS5 and sqlite-vec live in the same database file. There’s no cross-process call, no separate query coordinator, no pagination dance. The query planner has visibility into both indexes and the metadata table simultaneously.

This is qualitatively different from “we have a Postgres FTS column and we also call Pinecone”. There, the rank fusion has to happen in your application code, with two roundtrips and two paginations. Sub-100ms hybrid search across 100k chunks is just not a thing in that architecture; it’s routinely achievable when both indexes are local SQL.

The interesting design choice in StrataFS isn’t the ranking algorithm — RRF and weighted fusion are well-studied. The interesting choice is putting both indexes in one SQLite file, so the planner can see them both.

Tuning, in 15 minutes

You don’t need to be precise about weights. You need to be self-consistent. Take 20 queries you know the right answers to, run them against StrataFS at the default weights, score yourself on whether the right doc is in the top 3. Then nudge w_vec by 0.1, re-run, re-score. Stop when you stop seeing improvement.

For most people, the default weights are right. If you’re tuning beyond 30 queries, you’re either chasing a corpus with unusual structure (logs, time-series, generated docs) or you’re over-optimizing.

Where to read more

  • The search engine page has the StrataFS-specific knobs and the query parameter reference.
  • sqlite-vec is the vector extension; worth understanding its HNSW parameters if you’re indexing more than a million chunks.
  • The original BM25 paper is “Okapi at TREC-3” (Robertson et al., 1994) — fifty pages, half of which are still relevant.

Hybrid search isn’t a buzzword. It’s the right answer to two different failure modes that both bite you, every day. Run BM25. Run vectors. Fuse them. StrataFS handles the SQL.