Questions we keep getting (and answers we keep giving).
Product questions
What is StrataFS, in one sentence?
StrataFS is an open-source semantic filesystem that indexes Local, S3, GCS, Azure, SharePoint, Google Drive, and Jira sources, runs hybrid full-text + vector search over them, and exposes the result as a Model Context Protocol server for AI agents.
Who is StrataFS for?
Developers who want better-than-grep search over their codebase, AI engineers who need a Model Context Protocol-native retrieval layer, and enterprise teams who want semantic search across SharePoint, Drive, and Jira without a SaaS vendor in the data path.
What's the license?
MIT. The full source is on GitHub at github.com/neul-labs/stratafs. No usage limits, no telemetry, no 'we may change this license later' clause.
How mature is the project?
Version 0.2.1 as of June 2026, active development. Production-ready for single-user and small-team deployments; enterprise RBAC and at-rest encryption are in active development for the next release.
Install questions
How do I install StrataFS?
Pick one: 'npm install -g stratafs', 'pip install stratafs', 'brew install stratafs', the Docker image at ghcr.io/neul-labs/stratafs, or build from source with 'make build'. All methods produce the same binary.
Does StrataFS need a GPU?
No. The default embedding model (BGE Base EN v1.5) runs on CPU via ONNX Runtime at 50–100 files/sec. A GPU helps indexing throughput but isn't required for any operation, including search.
What platforms are supported?
macOS 12+ (Apple Silicon or Intel), Linux kernel 5.4+ (glibc or musl), and Windows 10/11. Native installers ship signed binaries; the Docker image is multi-arch (amd64 + arm64).
How much disk space does the index take?
Roughly 1.5–2× the original text size of the indexed content, with compression enabled by default (40–60% savings). A 10 GB documentation corpus produces a 15–20 GB local index.
Search questions
What's hybrid search and why does it matter?
Hybrid search fuses BM25 (full-text), vector cosine similarity (semantic), and metadata signals (recency, filename match, file-type) into one ranking. It catches the queries pure full-text misses (synonyms, intent) AND the queries pure vector misses (exact identifiers, error codes). StrataFS does this in a single SQL query.
Can I do FTS-only or vector-only search?
Yes. The 'mode' parameter on /search switches between 'hybrid' (default), 'fts' (BM25 only), and 'vector' (cosine only). Use 'fts' when you know the exact phrase, 'vector' when you want semantic exploration.
What file types are supported?
35+ types: Markdown, plain text, PDF, DOCX, PPTX, XLSX, CSV, HTML, XML, JSON, YAML, TOML, INI, plus source code in Go, Python, JavaScript, TypeScript, Java, Kotlin, C, C++, Rust, Swift, Ruby, PHP, Shell, and SQL.
How fast is search?
Under 100 ms warm on a 10 000-file corpus, 100–300 ms on 500k+ chunks. The first query after process start takes about 1 second while the embedding model loads into memory.
AI agents questions
What is MCP and why does StrataFS support it natively?
The Model Context Protocol is Anthropic's open spec (late 2024) for letting AI agents discover and call tools over JSON-RPC. StrataFS ships an MCP server on port 8081, so any MCP-aware client — Claude Desktop, Claude Code, custom agents — sees stratafs.search and three other tools without integration work.
Does the embedding model phone home?
No. The default ONNX model is downloaded once from HuggingFace; thereafter all embedding computation happens in-process. Search queries, agent calls, and indexing make no outbound network connections except to the storage backends you configured.
Can I swap the embedding model?
Yes. Point the 'embedding.model' config field at any ONNX-compatible model file on disk. Common alternates include BGE Small EN v1.5 (smaller, faster, 384-dim) and domain-specific models for legal or medical text.
Will my files be sent to Anthropic, OpenAI, or any other AI vendor?
Not by StrataFS. The embedding model runs locally. The MCP server is bound to 127.0.0.1 by default. If your agent client (Claude Desktop, ChatGPT) sends chunks of your files to its vendor as part of conversation context, that's the client's behaviour — not StrataFS's.
Operations questions
How do I back up the index?
rsync ~/.stratafs/ to wherever you keep backups. The index is per-source SQLite files; copying them is enough. To restore, rsync them back and restart 'stratafs serve'. No corruption possible during a copy because SQLite WAL is atomic.
How do I run StrataFS as a background service?
Native installers register a systemd / launchd / Windows Service automatically. For npm/pip installs, set up a user-level systemd unit on Linux or a launchd plist on macOS — examples are in the install page on this site.
Can I run multiple StrataFS instances?
Yes, as long as they don't share state directories. The typical multi-instance pattern is one per user (each indexes their personal sources) or one per team (a shared deployment indexes team sources). The MCP server URL distinguishes them for agent clients.
What happens if a source is unreachable?
The polling worker for that source backs off and retries on a schedule. Existing index data remains queryable. 'stratafs.list_sources' (and the corresponding REST endpoint) reports the last successful sync time so agents and users can tell when a source is stale.
Index your storage. Search it like a human.
MIT-licensed. Multi-storage. Native MCP server for AI agents. Local, S3, GCS, Azure, SharePoint, Drive and Jira.