Seven backends. One query interface. Your data stays put.
StrataFS reads from your storage. It does not move, copy, or transform
the source files. Indexes live in a parallel .stratafs/
directory, one SQLite database per source.
-
Local filesystem
Real-timefsnotify-driven, instant updates when files change.
-
Amazon S3
PolledPolling-based sync, IAM-friendly read-only credentials.
-
Google Cloud Storage
PolledService account JSON, bucket-level scoping.
-
Azure Blob Storage
PolledAccount key or SAS, container-level scoping.
-
SharePoint / OneDrive
PolledMicrosoft Graph delta API, enterprise-ready.
-
Google Drive
PolledOAuth2 + native Docs export.
-
Jira
PolledIssues, descriptions, attachments via REST API.
How sources are configured
One TOML block per source. The config schema is the same shape across all
backends — only the type and credential block changes.
# config.toml
[[sources]]
name = "code"
type = "local"
path = "/Users/me/work/acme"
[[sources]]
name = "docs"
type = "s3"
bucket = "acme-docs"
region = "us-east-1"
prefix = "engineering/"
# Credentials read from ~/.aws/credentials, env, or IAM role.
[[sources]]
name = "team-drive"
type = "drive"
folder_id = "1AbCdEfGhIjKlMnOpQrSt"
# OAuth2 token in $STRATAFS_DRIVE_TOKEN.
[[sources]]
name = "infra-jira"
type = "jira"
host = "https://acme.atlassian.net"
project = "INFRA"
# Bearer token in $STRATAFS_JIRA_TOKEN. Real-time vs. polled
Local sources use the OS's native file-change notification (fsnotify
on Linux/macOS/Windows) — sub-second update latency. Cloud sources poll
with delta APIs at a configurable interval (default 60 seconds for
drive-style sources, 5 minutes for buckets), and the per-source SQLite
database holds the cursor / etag so polls are cheap.
What gets indexed
Every file that matches the source's include globs (default:
all files) and isn't matched by the exclude globs
(default: .git/, node_modules/,
.venv/, binary files). The default exclusion list is tuned
to ignore obvious noise; you can override it per source.
Per-source isolation is a property of the schema, not a
convention: each source maps to its own SQLite file under
.stratafs/sources/<name>.db. Drop a source, delete a
file. No vacuum required.
Credentials
Read-only credentials, always:
- Local: file mode 444 will work;
chmod -R a-wis fine. - S3:
s3:ListBucket+s3:GetObjectare sufficient. - GCS:
storage.objects.get+storage.objects.list. - Azure Blob:
Storage Blob Data Readerrole. - SharePoint:
Sites.Read.Allvia Microsoft Graph. - Google Drive:
https://www.googleapis.com/auth/drive.readonly. - Jira: a Personal Access Token with
read:jira-work.
Multiple sources, one query
Each source is its own SQLite database, but a single /search call
can fan out across all of them. The hybrid score is normalized per source so
a 0.85 BM25 hit from your docs corpus is comparable to a 0.85 vector hit
from your code. You can scope a query to a subset with
?source=code,docs.
Adding a backend
The FileSystem interface is small: List,
Stat, Read, Watch. The registry
pattern picks an implementation from the type field. Adding
a new backend (Confluence, Notion, Dropbox) is one Go file plus a factory
branch.
Bring all your storage under one search.
Local, S3, GCS, Azure, SharePoint, Drive, Jira — same query interface, isolated databases, read-only credentials.