The question we get most often, in some form: “We already run Elasticsearch. Why would we use StrataFS instead of bolting vector search onto Elastic?”

This is the long version of the answer. Elasticsearch is excellent. StrataFS solves a different shape of problem. Knowing which problem you have is most of the work.

The honest summary

DimensionStrataFSElasticsearch
Deployment modelSingle binary, SQLite on diskJVM cluster (1–N nodes)
Operational surfaceNone — it’s a fileCluster ops, JVM tuning, shards
Cold start time~1 second30–60 seconds (node + warm-up)
Warm hybrid query (10k chunks)< 100 ms< 100 ms
Warm hybrid query (10M chunks)200–400 ms (one source per DB)100–200 ms
Warm hybrid query (1B chunks)Not supported in one query100–300 ms (sharded)
Multi-tenant in one clusterOne DB per sourceNative
MCP server for AI agentsBuilt inPlugin/middleware
Native embedding generationBuilt in (ONNX local)External (or paid sub for ELSER)
LicenseMITElastic License v2 (source-available)
Cost at 10M chunks$0 + your laptop$$ (cluster + ops)
Cost at 1B chunksNot the right tool$$$ but feasible

Read this table as: “the costs are different at every scale, and the question is which scale you’re at.”

Where StrataFS wins

Per-user / per-developer / per-team deployment. When the consumer of search is one person or one team, StrataFS’ single-binary, zero-ops model is qualitatively better. You install it like a CLI. No cluster.

Agent-side retrieval. Putting a search engine on each developer’s laptop, indexing their code, accessible via MCP — Elasticsearch isn’t built for this and the friction shows. StrataFS is built for it.

Hybrid FTS + vector in one query, with low effort. Elasticsearch does both individually, and combining them is a tractable engineering project. StrataFS does it out of the box and the query is one SQL statement. The default just works.

Multi-storage indexing. StrataFS reads from Local + S3 + GCS + Azure + SharePoint + Drive + Jira in a unified way. Elasticsearch needs an ingest pipeline per source.

Operational simplicity. No JVM. No GC pauses. No cluster split-brain. The “ops handbook” for StrataFS is “rsync the SQLite files for backup; upgrade the binary in place”.

Where Elasticsearch wins

Billion-scale corpora. Elasticsearch shards across nodes. StrataFS doesn’t. At >50M chunks per logical scope, Elastic is the right tool.

Shared multi-tenant infrastructure. A central team running Elasticsearch for many product teams gets economies of scale that per-developer StrataFS instances don’t.

Analytics and aggregations. Elasticsearch’s analytics features (date histograms, geo aggregations, Kibana dashboards) are best-in-class. StrataFS is a search engine, not an analytics platform.

Mature ecosystem. Logstash, Beats, Kibana, the whole ELK story. StrataFS is young and doesn’t try to be all of those things.

Existing institutional knowledge. If your platform team already runs Elastic and is good at it, the marginal cost of adding vector search to Elastic is lower than introducing a new tool.

The hybrid-search story, specifically

Elasticsearch supports both full-text (default) and dense vector search (since 8.0). You can combine them with RRF or a custom script_score. Done well, it works. Done quickly with defaults, it’s just OK.

Three friction points show up:

  1. Vector indexing cost. Elasticsearch’s HNSW build is good but ingest-heavy; you’ll want to provision specifically for it. SQLite + sqlite-vec builds the HNSW in milliseconds per chunk.
  2. Rank fusion is application-level. Elastic’s RRF is a function, but tuning weighted fusion with metadata signals requires script_score queries. StrataFS does this in plain SQL.
  3. You bring your own embedder. Elastic has ELSER (their proprietary sparse encoder) and supports calling external embedders. StrataFS runs ONNX in-process. The difference is zero API calls and zero rate-limits.

For a new hybrid-search project, the StrataFS path is shorter. For extending an existing Elastic deployment, the Elastic path is shorter.

The case for running both

In some teams, the right answer is “both, for different things”:

  • Elasticsearch for shared corpora — central docs, cross-team logs, customer-data search. Run by the platform team. Scaled, observed, integrated.
  • StrataFS for personal/team semantic search — each developer’s codebase, an AI agent’s knowledge tool, departmental Drive folders. Run by the user. Zero ops.

These don’t compete. They cover different points on the developer-experience spectrum.

Concrete decision tree

If you can answer “yes” to most of these, StrataFS is the right tool:

  • The consumer of search is one person or one small team.
  • The corpus is under 50M chunks per logical scope.
  • You want hybrid FTS + vector in one query with zero ops.
  • You want an MCP server for AI agents without writing one.
  • You’d rather upgrade a binary than run a JVM cluster.

If you can answer “yes” to most of these, Elasticsearch is the right tool:

  • The corpus is billion-scale or growing toward it.
  • Search is a shared service across many internal customers.
  • You need Kibana / analytics / aggregations alongside search.
  • You already operate Elastic well and don’t want to introduce a new tool.
  • You need search SLAs measured in millions of QPS.

If you’re somewhere in the middle, install StrataFS for a week. Five minutes to install, an hour to point at your corpus, a week of natural usage to know whether it’s enough.

A pattern that often works

Start with StrataFS for the per-developer use case (codebase + personal docs + agent retrieval). Keep Elasticsearch (or whatever you have) for the shared service. If StrataFS replaces enough of the shared use case naturally over time — many teams find it does — consolidate. If it doesn’t, you have a per-developer tool that wasn’t an Elastic-cluster expense.

This is the inverse of the usual “let’s run one search service for everyone” pattern. It works because per-developer search is a different shape of problem than enterprise search, and forcing both into one architecture is what made enterprise search expensive in the first place.

Reading list

Both tools are good. They solve different shapes of problem. Pick the tool whose shape matches yours.