Storage backends

Seven backends. One query interface. Your data stays put.

StrataFS reads from your storage. It does not move, copy, or transform the source files. Indexes live in a parallel .stratafs/ directory, one SQLite database per source.

Local filesystem
Real-time

fsnotify-driven, instant updates when files change.
Amazon S3
Polled

Polling-based sync, IAM-friendly read-only credentials.
Google Cloud Storage
Polled

Service account JSON, bucket-level scoping.
Azure Blob Storage
Polled

Account key or SAS, container-level scoping.
SharePoint / OneDrive
Polled

Microsoft Graph delta API, enterprise-ready.
Google Drive
Polled

OAuth2 + native Docs export.
Jira
Polled

Issues, descriptions, attachments via REST API.

How sources are configured

One TOML block per source. The config schema is the same shape across all backends — only the type and credential block changes.

# config.toml

[[sources]]
name = "code"
type = "local"
path = "/Users/me/work/acme"

[[sources]]
name = "docs"
type = "s3"
bucket = "acme-docs"
region = "us-east-1"
prefix = "engineering/"
# Credentials read from ~/.aws/credentials, env, or IAM role.

[[sources]]
name = "team-drive"
type = "drive"
folder_id = "1AbCdEfGhIjKlMnOpQrSt"
# OAuth2 token in $STRATAFS_DRIVE_TOKEN.

[[sources]]
name = "infra-jira"
type = "jira"
host = "https://acme.atlassian.net"
project = "INFRA"
# Bearer token in $STRATAFS_JIRA_TOKEN.

Real-time vs. polled

Local sources use the OS's native file-change notification (fsnotify on Linux/macOS/Windows) — sub-second update latency. Cloud sources poll with delta APIs at a configurable interval (default 60 seconds for drive-style sources, 5 minutes for buckets), and the per-source SQLite database holds the cursor / etag so polls are cheap.

What gets indexed

Every file that matches the source's include globs (default: all files) and isn't matched by the exclude globs (default: .git/, node_modules/, .venv/, binary files). The default exclusion list is tuned to ignore obvious noise; you can override it per source.

Per-source isolation is a property of the schema, not a convention: each source maps to its own SQLite file under .stratafs/sources/<name>.db. Drop a source, delete a file. No vacuum required.

Credentials

Read-only credentials, always:

Local: file mode 444 will work; chmod -R a-w is fine.
S3: s3:ListBucket + s3:GetObject are sufficient.
GCS: storage.objects.get + storage.objects.list.
Azure Blob: Storage Blob Data Reader role.
SharePoint: Sites.Read.All via Microsoft Graph.
Google Drive: https://www.googleapis.com/auth/drive.readonly.
Jira: a Personal Access Token with read:jira-work.

Multiple sources, one query

Each source is its own SQLite database, but a single /search call can fan out across all of them. The hybrid score is normalized per source so a 0.85 BM25 hit from your docs corpus is comparable to a 0.85 vector hit from your code. You can scope a query to a subset with ?source=code,docs.

Adding a backend

The FileSystem interface is small: List, Stat, Read, Watch. The registry pattern picks an implementation from the type field. Adding a new backend (Confluence, Notion, Dropbox) is one Go file plus a factory branch.

Bring all your storage under one search.

Local, S3, GCS, Azure, SharePoint, Drive, Jira — same query interface, isolated databases, read-only credentials.

Install StrataFS → Architecture overview

Local filesystem

Amazon S3

Google Cloud Storage

Azure Blob Storage

SharePoint / OneDrive

Google Drive

Jira