Documentation Specification SDKs

Why a Bundle?

OmniData is a directory bundle — not a server, not a single file, not a cloud service. This is a deliberate architectural choice that arrived at v0.2.0 after experience with the single-SQLite-file approach in v0.1.0.

The macOS prior art

Directory bundles are a proven pattern. macOS has used them for decades:

  • .app — an application is a directory containing executables, resources, and metadata
  • .key / .pages — iWork documents are directories containing XML, images, and thumbnails
  • .band — a GarageBand project is a directory containing audio files, MIDI data, and project state

In every case, the directory looks and acts like a single file to the user. You double-click it, you move it, you AirDrop it. The Finder hides the internal structure. But inside, each component uses the right format for its job: XML for structure, binary for media, SQLite for indexes.

OmniData follows the same principle. A .omnidata bundle is a directory that the OS can register as an opaque unit, but internally uses the right tool for each job.

Three approaches compared

Single file (v0.1.0)

Pack everything into one SQLite database: metadata, search indexes, relationships, and binary blobs stored as BLOBs in rows.

Advantages: One file to copy, atomic transactions across everything, simple mental model.

Problems we hit: Large blobs in SQLite rows bloat the database file. VACUUM becomes expensive. Backup tools can’t deduplicate at the blob level. You can’t use filesystem-level compression or snapshots on individual blobs. Graph traversal queries and full-text search queries compete for the same WAL. The database file grows monotonically even as content is replaced.

Database server

Run a process that exposes an HTTP API, stores data in its own internal format, requires authentication.

Problems: Portability requires export/import. The server must be running for any access. Every consumer must speak the API. Running multiple isolated instances means multiple servers on multiple ports. Knowledge is trapped behind the process.

Directory bundle (v0.2.0)

A directory containing specialized files: SQLite for queries, filesystem for blobs, JSON for configuration.

Advantages: Each component uses the right storage. Blobs are real files that the OS, filesystem, and backup tools understand natively. Two SQLite databases with different access patterns avoid lock contention. The bundle is still portable — cp -r, rsync, zip, AirDrop all work.

SQLite for what SQLite is good at

OmniData uses two SQLite databases, separated by access pattern:

index.db handles search and metadata: resource records, FTS5 full-text indexes, BM25 ranking, embedding vectors, and the work queue. These are read-heavy, write-occasional workloads where SQLite’s WAL mode and ACID transactions shine.

memory.db handles hierarchy and relationships: edges between resources, graph traversals, relationship weights. Graph queries have different locking patterns than search queries — they walk many rows across indexes in a single transaction. Keeping them in a separate database means a deep graph traversal never blocks a search query.

SQLite is the most widely deployed database engine in the world and a recommended format for long-term data archival by the US Library of Congress.

Filesystem for what the filesystem is good at

Blobs are files on disk, stored by SHA-256 hash in fanout directories:

blobs/
  ab/
    ab3f7c8e91...  (a PDF)
  f0/
    f0d14a22b7...  (a screenshot)

This is the same content-addressed pattern used by Git (.git/objects), Docker image layers, and Nix stores. It gives you:

Deduplication for free. Same content, same hash, one file on disk. Ten resources referencing the same PDF store it once.

Filesystem-as-runtime. Because blobs are real files, you inherit everything the filesystem offers:

  • btrfs: transparent compression, snapshots, block-level dedup
  • ZFS: send/receive for incremental replication, checksumming, snapshots
  • APFS: instant clones, space sharing between bundles
  • S3-compatible backends: mount with s3fs or goofys, blobs become objects

None of this required a single line of OmniData code. The filesystem does the work.

Standard tooling. du -sh blobs/ tells you content size. find blobs/ -mtime -1 shows recent additions. rsync --link-dest creates space-efficient backups with hardlinks for unchanged blobs.

Still portable

A .omnidata bundle is a directory. Moving it is the same as moving any directory:

cp -r my-knowledge.omnidata /Volumes/USB/
zip -r my-knowledge.omnidata.zip my-knowledge.omnidata/
rsync -a my-knowledge.omnidata remote:~/

Every programming language can read it: open the SQLite databases, read files from the blob directory, parse the JSON manifest. No HTTP client needed. No API versioning. No authentication tokens.

The runtime layer

A bundle format alone is read/write operations on databases and files. The OmniData runtime adds:

  • Adapters: Plugins that discover and ingest content
  • Chunking: Breaking text into segments for embedding
  • Embedding: Generating vector representations for semantic search
  • RRF Search: Reciprocal Rank Fusion combining vector similarity and full-text search
  • Blob management: Content-addressed storage with hash verification
  • Graph operations: Traversals, path-finding, and relationship inference over memory.db

The runtime operates ON the bundle format. It is not part of the format. Any implementation that reads and writes the schema correctly is a valid OmniData runtime.

When a server makes sense

A bundle is the right default. But some workloads genuinely need a server:

  • Multi-writer concurrency: If many processes need to write simultaneously, a server can serialize writes more efficiently than SQLite’s single-writer lock.
  • Access control: If different users need different permissions on the same knowledge, a server can enforce authorization that a file on disk cannot.
  • Real-time sync: If changes must propagate instantly to remote consumers, a server with websockets or change streams is more natural than polling a file.

In these cases, a server that reads and writes the .omnidata bundle format is a valid architecture. The bundle is the storage format; the server is the access layer. The knowledge remains portable — shut down the server, the bundle is still a directory you can copy.