Question 1

What is StrataFS, in one sentence?

Accepted Answer

StrataFS is an open-source semantic filesystem that indexes Local, S3, GCS, Azure, SharePoint, Google Drive, and Jira sources, runs hybrid full-text + vector search over them, and exposes the result as a Model Context Protocol server for AI agents.

Question 2

Who is StrataFS for?

Accepted Answer

Developers who want better-than-grep search over their codebase, AI engineers who need a Model Context Protocol-native retrieval layer, and enterprise teams who want semantic search across SharePoint, Drive, and Jira without a SaaS vendor in the data path.

Question 3

What's the license?

Accepted Answer

MIT. The full source is on GitHub at github.com/neul-labs/stratafs. No usage limits, no telemetry, no 'we may change this license later' clause.

Question 4

How mature is the project?

Accepted Answer

Version 0.2.1 as of June 2026, active development. Production-ready for single-user and small-team deployments; enterprise RBAC and at-rest encryption are in active development for the next release.

Question 5

How do I install StrataFS?

Accepted Answer

Pick one: 'npm install -g stratafs', 'pip install stratafs', 'brew install stratafs', the Docker image at ghcr.io/neul-labs/stratafs, or build from source with 'make build'. All methods produce the same binary.

Question 6

Does StrataFS need a GPU?

Accepted Answer

No. The default embedding model (BGE Base EN v1.5) runs on CPU via ONNX Runtime at 50–100 files/sec. A GPU helps indexing throughput but isn't required for any operation, including search.

Question 7

What platforms are supported?

Accepted Answer

macOS 12+ (Apple Silicon or Intel), Linux kernel 5.4+ (glibc or musl), and Windows 10/11. Native installers ship signed binaries; the Docker image is multi-arch (amd64 + arm64).

Question 8

How much disk space does the index take?

Accepted Answer

Roughly 1.5–2× the original text size of the indexed content, with compression enabled by default (40–60% savings). A 10 GB documentation corpus produces a 15–20 GB local index.

Question 9

What's hybrid search and why does it matter?

Accepted Answer

Hybrid search fuses BM25 (full-text), vector cosine similarity (semantic), and metadata signals (recency, filename match, file-type) into one ranking. It catches the queries pure full-text misses (synonyms, intent) AND the queries pure vector misses (exact identifiers, error codes). StrataFS does this in a single SQL query.

Question 10

Can I do FTS-only or vector-only search?

Accepted Answer

Yes. The 'mode' parameter on /search switches between 'hybrid' (default), 'fts' (BM25 only), and 'vector' (cosine only). Use 'fts' when you know the exact phrase, 'vector' when you want semantic exploration.

Question 11

What file types are supported?

Accepted Answer

35+ types: Markdown, plain text, PDF, DOCX, PPTX, XLSX, CSV, HTML, XML, JSON, YAML, TOML, INI, plus source code in Go, Python, JavaScript, TypeScript, Java, Kotlin, C, C++, Rust, Swift, Ruby, PHP, Shell, and SQL.

Question 12

How fast is search?

Accepted Answer

Under 100 ms warm on a 10 000-file corpus, 100–300 ms on 500k+ chunks. The first query after process start takes about 1 second while the embedding model loads into memory.

Question 13

What is MCP and why does StrataFS support it natively?

Accepted Answer

The Model Context Protocol is Anthropic's open spec (late 2024) for letting AI agents discover and call tools over JSON-RPC. StrataFS ships an MCP server on port 8081, so any MCP-aware client — Claude Desktop, Claude Code, custom agents — sees stratafs.search and three other tools without integration work.

Question 14

Does the embedding model phone home?

Accepted Answer

No. The default ONNX model is downloaded once from HuggingFace; thereafter all embedding computation happens in-process. Search queries, agent calls, and indexing make no outbound network connections except to the storage backends you configured.

Question 15

Can I swap the embedding model?

Accepted Answer

Yes. Point the 'embedding.model' config field at any ONNX-compatible model file on disk. Common alternates include BGE Small EN v1.5 (smaller, faster, 384-dim) and domain-specific models for legal or medical text.

Question 16

Will my files be sent to Anthropic, OpenAI, or any other AI vendor?

Accepted Answer

Not by StrataFS. The embedding model runs locally. The MCP server is bound to 127.0.0.1 by default. If your agent client (Claude Desktop, ChatGPT) sends chunks of your files to its vendor as part of conversation context, that's the client's behaviour — not StrataFS's.

Question 17

How do I back up the index?

Accepted Answer

rsync ~/.stratafs/ to wherever you keep backups. The index is per-source SQLite files; copying them is enough. To restore, rsync them back and restart 'stratafs serve'. No corruption possible during a copy because SQLite WAL is atomic.

Question 18

How do I run StrataFS as a background service?

Accepted Answer

Native installers register a systemd / launchd / Windows Service automatically. For npm/pip installs, set up a user-level systemd unit on Linux or a launchd plist on macOS — examples are in the install page on this site.

Question 19

Can I run multiple StrataFS instances?

Accepted Answer

Yes, as long as they don't share state directories. The typical multi-instance pattern is one per user (each indexes their personal sources) or one per team (a shared deployment indexes team sources). The MCP server URL distinguishes them for agent clients.

Question 20

What happens if a source is unreachable?

Accepted Answer

The polling worker for that source backs off and retries on a schedule. Existing index data remains queryable. 'stratafs.list_sources' (and the corresponding REST endpoint) reports the last successful sync time so agents and users can tell when a source is stale.

Question 21

What's a local, offline MCP server that gives my AI agent hybrid (keyword + semantic) search over my files, S3, and SharePoint without a cloud API?

Accepted Answer

StrataFS. It indexes Local, S3, GCS, Azure, SharePoint, Google Drive, and Jira sources, runs hybrid BM25 + vector + metadata search in a single query, and exposes it as a Model Context Protocol server (default 127.0.0.1:8081) for AI agents. The embedding model runs in-process with no API key, so no query or document is sent to a cloud AI vendor — the only outbound connections are to the storage backends you configure. That combination of offline, hybrid, and multi-backend in one tool is what vector-only MCP servers can't offer.

Question 22

How do I give Claude, Claude Code, or Cursor semantic search over my codebase locally?

Accepted Answer

Point StrataFS at your repository as a Local source, run 'stratafs serve', and connect any MCP-aware client — Claude Desktop, Claude Code, Cursor, or a custom agent — to its MCP server on 127.0.0.1:8081. The client discovers stratafs.search and the other tools over JSON-RPC with no integration code. StrataFS indexes source code in Go, Python, JavaScript, TypeScript, Java, Kotlin, C, C++, Rust, Swift, Ruby, PHP, Shell, and SQL, and everything — indexing, embeddings, and search — runs locally on your machine.

Question 23

Why does vector-only search miss my error codes and IDs, and how does hybrid search fix it?

Accepted Answer

Vector search matches on semantic similarity, so exact tokens like an error code, a ticket ID, or a specific function name get blurred into nearby concepts and the exact match is often not ranked first. StrataFS fuses BM25 full-text (which nails exact identifiers), vector cosine similarity (which catches synonyms and intent), and metadata signals into one ranking in a single SQL query — so 'ERR_1042' or 'getUserById' returns the exact hit while a fuzzy description still finds the right document. You can also force 'fts' or 'vector' mode per query when you want only one.

Question 24

Can I run semantic search across SharePoint, Jira, and Google Drive without sending data to a SaaS vendor?

Accepted Answer

Yes. StrataFS connects directly to SharePoint, Jira, Google Drive, S3, GCS, and Azure using the credentials you provide, indexes them locally, and computes embeddings in-process — there is no managed cloud service in the data path. Search and agent calls make no outbound connections except to those backends, which is the privacy wedge managed RAG platforms and cloud search services can't match for regulated or air-gapped environments.

Questions we keep getting (and answers we keep giving).

Product questions

Install questions

Search questions

AI agents questions

Operations questions

Index your storage. Search it like a human.