Specification Overview
A valid .omnidata container is a directory bundle that conforms to Schema Version 2 of the OmniData specification. This page defines what must be present and the conventions all implementations must follow.
Directory structure
A valid .omnidata directory contains these files:
name.omnidata/
├── manifest.json Identity: owner, hat, schema version
├── index.db SQLite: resources, chunks, embeddings, FTS5, queue, kv
├── memory.db SQLite: collections, edges, tags, memory
├── blobs/ Filesystem: content-addressed by SHA-256
│ └── ab/ab3f7c...sha256
├── adapters.json Adapter registry: config and sync state
└── ingress.log Append-only JSONL: ingest audit trail
All files must exist, even if empty or containing only defaults. A directory missing any required file is not a valid .omnidata container.
manifest.json
The identity of the container. A single JSON object with owner, hat, schema version, and instance metadata. See Manifest.
index.db
A SQLite database containing the resource registry, text chunks, vector embeddings, the FTS5 full-text index, the processing queue, and the key-value store. This is the primary queryable surface for search and retrieval.
memory.db
A SQLite database containing collections, edges, tags, and structured memory (facts, preferences, events). Separated from index.db to allow independent compaction and backup of the knowledge graph.
blobs/
A filesystem directory for content-addressed binary storage. Files are named by their SHA-256 hash and organized into fanout subdirectories. See Blob Storage.
adapters.json
A JSON file containing adapter configuration and sync state. Makes the container self-describing and portable. See Adapter Registry.
ingress.log
An append-only JSONL file that records every ingest event. Each line is a JSON object with a timestamp, adapter name, resource URI, and outcome. Used for auditing and replay.
Required PRAGMAs
Both index.db and memory.db must be opened with these PRAGMAs set:
PRAGMA journal_mode = WAL; -- Write-Ahead Logging for concurrent reads
PRAGMA foreign_keys = ON; -- Enforce referential integrity
PRAGMA cache_size = -64000; -- 64MB page cache (negative = kilobytes)
WAL mode is non-negotiable. It enables concurrent readers alongside a single writer — critical for applications where a sync worker writes while a search surface reads. Foreign keys ensure relational integrity within each database. The cache size is a recommended default; implementations may adjust based on available memory.
Conventions
Soft deletes
No row is ever hard-deleted. Every table with user data includes a deleted_at column (TEXT, ISO 8601). A non-NULL deleted_at means the row is logically deleted. All queries should filter with WHERE deleted_at IS NULL unless explicitly querying deleted records.
Timestamps
All timestamp columns use ISO 8601 format in UTC: YYYY-MM-DDTHH:MM:SSZ. Columns are named created_at, updated_at, deleted_at, and resource_at (the source’s own timestamp).
UUIDs
All id columns use UUID v4 strings. Generated at creation time, never reused.
Content addressing
Binary content is identified by its SHA-256 hash (content_hash). The same content stored twice produces the same hash and is stored only once in the blobs/ directory. The hash in index.db resources points to a file on disk, not a database row.
JSON metadata
Extensible metadata is stored in metadata TEXT columns as JSON objects. Core schema columns cover universal fields; adapter-specific or source-specific data goes in metadata.
Cross-database references
Some tables in memory.db reference resource IDs that live in index.db. These references are logical — not enforced by foreign keys, since SQLite foreign keys cannot span databases. Implementations must handle referential integrity at the application layer for cross-database links.
Schema version
The schema_version field in manifest.json identifies which version of the specification this container conforms to. Schema Version 2 is the current version, introducing the directory bundle format. Future versions will include migration paths.