The phrase “embedded search engine” used to describe Lucene-in-a-jar setups: useful, but heavy and JVM-bound. The post-2024 reality is different. SQLite, with the FTS5 module that’s been there for a decade and the new sqlite-vec extension, can run hybrid search — full text and vector, fused — in a single file, in a single query, with no daemon.
This is the technical write-up of how that works, why we built StrataFS on it, and how to use the same pattern in your own application.
The two extensions you need
FTS5 has shipped with SQLite since 2015. It provides a virtual table that maintains a posting-list inverted index over a text column, supports BM25 ranking out of the box, and handles Unicode-aware tokenization. To use it, build SQLite with -tags fts5 (Go) or --enable-fts5 (C); most distributions already do.
-- Create the FTS index.
CREATE VIRTUAL TABLE file_chunks_fts
USING fts5(content, tokenize = 'unicode61 remove_diacritics 2');
-- Insert (paired with the regular table by rowid).
INSERT INTO file_chunks_fts(rowid, content) VALUES (1492, 'Refresh token handling...');
-- Query with BM25.
SELECT rowid, bm25(file_chunks_fts) AS score
FROM file_chunks_fts
WHERE file_chunks_fts MATCH 'refresh token'
ORDER BY score LIMIT 10;
sqlite-vec is the newer addition (by Alex Garcia, 2024). It provides a virtual table for vector embeddings, supports cosine + Euclidean + L2 distance, and exposes HNSW (hierarchical navigable small world) indexing for sub-linear nearest-neighbour search. Install it as a loadable extension or compile it in.
-- Create the vector index, 768-dim BGE Base embeddings.
CREATE VIRTUAL TABLE file_chunks_vec
USING vec0(embedding FLOAT[768]);
-- Insert.
INSERT INTO file_chunks_vec(rowid, embedding) VALUES (1492, vec_f32(:bytes));
-- Nearest-neighbour query.
SELECT rowid, distance
FROM file_chunks_vec
WHERE embedding MATCH :query_embedding
ORDER BY distance LIMIT 10;
Two virtual tables, two query patterns. The magic happens when they live in the same database.
One database file, two indexes
Most teams hit hybrid search and reach for two systems: Elasticsearch for FTS, Pinecone for vectors. The application then has to issue two queries, paginate independently, fuse the ranks, and pray the two systems don’t disagree about what’s indexed.
In SQLite, you avoid all of that:
-- One database, three tables (regular + two virtual indexes).
CREATE TABLE file_chunks(
id INTEGER PRIMARY KEY,
path TEXT NOT NULL,
content TEXT NOT NULL,
embedding BLOB,
updated_at INTEGER
);
CREATE VIRTUAL TABLE file_chunks_fts
USING fts5(content, content='file_chunks', content_rowid='id');
CREATE VIRTUAL TABLE file_chunks_vec
USING vec0(embedding FLOAT[768]);
The FTS index uses content='file_chunks' mode — it stores no extra copy of the text, only the posting lists, pointing back to the main table by rowid. Both indexes are kept in sync with the main table via triggers, which makes update semantics trivial:
CREATE TRIGGER file_chunks_after_insert AFTER INSERT ON file_chunks BEGIN
INSERT INTO file_chunks_fts(rowid, content) VALUES (new.id, new.content);
INSERT INTO file_chunks_vec(rowid, embedding) VALUES (new.id, new.embedding);
END;
Insert a row to file_chunks; both indexes update transactionally. There’s no “is FTS caught up to the source of truth” question, because they’re in the same WAL transaction.
The hybrid query
With both indexes in one database, the hybrid query is a single SQL statement:
WITH
fts AS (
SELECT rowid AS id, bm25(file_chunks_fts) AS s
FROM file_chunks_fts
WHERE file_chunks_fts MATCH :q
),
vec AS (
SELECT rowid AS id, 1.0 - distance AS s
FROM file_chunks_vec
WHERE embedding MATCH :q_emb AND k = 50
)
SELECT c.id, c.path, c.content,
COALESCE(f.s, 0) * 0.35 + COALESCE(v.s, 0) * 0.65 AS score
FROM file_chunks c
LEFT JOIN fts f USING (id)
LEFT JOIN vec v USING (id)
WHERE f.s IS NOT NULL OR v.s IS NOT NULL
ORDER BY score DESC LIMIT 10;
Two CTEs, one per retriever; a final SELECT that fuses scores. No application-layer rank fusion. No two-phase pagination. SQLite’s query planner sees both index reads and fuses them in C, which means the whole query runs in milliseconds on a normal laptop.
Why this is qualitatively different from “Postgres + Pinecone”
You can do hybrid search with Postgres’s tsvector + Pinecone-as-a-service. People do. The differences that matter:
- Latency: two network roundtrips (
tsquery+ Pinecone), vs. zero. On a warm SQLite query you’ll see 30–80 ms hybrid; on the two-service architecture you’re at 150–300 ms before the LLM call. - Operability: SQLite is a file you can
scp. Two-service hybrid means two services to monitor, two backups to coordinate, two failure modes to chase. - Consistency: one transaction inserts to both indexes atomically. With two services you have to design around the window where the FTS index has the new row but the vector index doesn’t.
- Cost: SQLite is free. Pinecone bills by vector count, with non-trivial floor pricing. For agent-side retrieval where every developer has their own index, the per-developer marginal cost of SQLite is zero.
The trade-off is scale. SQLite tops out around 10–100M chunks per database file before you start hitting query-time edge cases. StrataFS sidesteps this by giving every source its own database — the per-source files stay small, and the application fans out queries when needed.
The performance you actually get
Numbers from a 2024 MacBook Air, 16 GB RAM, indexing a mix of Go code and Markdown docs:
- 50–100 files/sec indexing throughput (parsing + embedding is the bottleneck, not SQLite).
- ~30 MB disk per 10k chunks with compression on.
- 80 ms warm hybrid query on a 10k-chunk corpus.
- 140 ms warm hybrid query on a 200k-chunk corpus.
- 1.2 s cold first query (embedding model load).
These are application-level numbers; the bare SQLite query is much faster.
The mistakes that bite
Three setup mistakes that cost real time:
1. Forgetting the FTS5 build tag. In Go, go build without -tags "fts5" silently links a SQLite without FTS5. Your CREATE VIRTUAL TABLE statement returns “no such module: fts5”. Always: go build -tags "fts5". StrataFS’s Makefile encodes this; if you’re DIY-ing, set it everywhere — build, test, CI.
2. Storing embeddings as BLOBs in file_chunks and in file_chunks_vec. You’re storing the same 3KB twice per chunk. Use file_chunks_vec exclusively for vector queries; if you need to read raw embeddings, expose a function that joins back.
3. Re-embedding on every restart. The vector column is your cache. If you’re recomputing embeddings on startup because you “want to be sure they’re up to date”, you’re throwing away the most expensive operation in the pipeline. Hash the content, skip re-embedding when hashes match.
When to reach for SQLite (and when not to)
Reach for SQLite + FTS5 + sqlite-vec when:
- The index lives on the same machine as the consumer (agent, IDE, CLI).
- Corpus is < 50M chunks total, or partitionable into < 10M per file.
- You want minimal operational surface area.
- You need atomicity between full-text and vector updates.
Don’t reach for it when:
- The corpus is shared across many concurrent writers.
- You need > 50M chunks in one queryable scope and can’t partition.
- You already have a heavy Elasticsearch cluster and the marginal cost of adding vector to it is small.
For everything in the first list, SQLite is the right tool. For StrataFS specifically — a per-user, per-source semantic index that has to start cold in under a second — it was the only sensible choice.
Read more
- sqlite-vec README is the canonical reference for the vector extension.
- SQLite FTS5 docs — long but well-written; skim the tokenizer section if you have non-English text.
- Our hybrid search deep-dive covers the ranking math separately from the storage choice.
- Or just install StrataFS and read
pkg/database/schema.sqlin the repo — the reference implementation is ~300 lines.