Documentation Specification SDKs

Specification Overview

A valid .omnidata container is a directory bundle that conforms to Schema Version 2 of the OmniData specification. This page defines what must be present and the conventions all implementations must follow.

Directory structure

A valid .omnidata directory contains these files:

name.omnidata/
├── manifest.json          Identity: owner, hat, schema version
├── index.db               SQLite: resources, chunks, embeddings, FTS5, queue, kv
├── memory.db              SQLite: collections, edges, tags, memory
├── blobs/                 Filesystem: content-addressed by SHA-256
│   └── ab/ab3f7c...sha256
├── adapters.json          Adapter registry: config and sync state
└── ingress.log            Append-only JSONL: ingest audit trail

All files must exist, even if empty or containing only defaults. A directory missing any required file is not a valid .omnidata container.

manifest.json

The identity of the container. A single JSON object with owner, hat, schema version, and instance metadata. See Manifest.

index.db

A SQLite database containing the resource registry, text chunks, vector embeddings, the FTS5 full-text index, the processing queue, and the key-value store. This is the primary queryable surface for search and retrieval.

memory.db

A SQLite database containing collections, edges, tags, and structured memory (facts, preferences, events). Separated from index.db to allow independent compaction and backup of the knowledge graph.

blobs/

A filesystem directory for content-addressed binary storage. Files are named by their SHA-256 hash and organized into fanout subdirectories. See Blob Storage.

adapters.json

A JSON file containing adapter configuration and sync state. Makes the container self-describing and portable. See Adapter Registry.

ingress.log

An append-only JSONL file that records every ingest event. Each line is a JSON object with a timestamp, adapter name, resource URI, and outcome. Used for auditing and replay.

Required PRAGMAs

Both index.db and memory.db must be opened with these PRAGMAs set:

PRAGMA journal_mode = WAL;        -- Write-Ahead Logging for concurrent reads
PRAGMA foreign_keys = ON;         -- Enforce referential integrity
PRAGMA cache_size = -64000;       -- 64MB page cache (negative = kilobytes)

WAL mode is non-negotiable. It enables concurrent readers alongside a single writer — critical for applications where a sync worker writes while a search surface reads. Foreign keys ensure relational integrity within each database. The cache size is a recommended default; implementations may adjust based on available memory.

Conventions

Soft deletes

No row is ever hard-deleted. Every table with user data includes a deleted_at column (TEXT, ISO 8601). A non-NULL deleted_at means the row is logically deleted. All queries should filter with WHERE deleted_at IS NULL unless explicitly querying deleted records.

Timestamps

All timestamp columns use ISO 8601 format in UTC: YYYY-MM-DDTHH:MM:SSZ. Columns are named created_at, updated_at, deleted_at, and resource_at (the source’s own timestamp).

UUIDs

All id columns use UUID v4 strings. Generated at creation time, never reused.

Content addressing

Binary content is identified by its SHA-256 hash (content_hash). The same content stored twice produces the same hash and is stored only once in the blobs/ directory. The hash in index.db resources points to a file on disk, not a database row.

JSON metadata

Extensible metadata is stored in metadata TEXT columns as JSON objects. Core schema columns cover universal fields; adapter-specific or source-specific data goes in metadata.

Cross-database references

Some tables in memory.db reference resource IDs that live in index.db. These references are logical — not enforced by foreign keys, since SQLite foreign keys cannot span databases. Implementations must handle referential integrity at the application layer for cross-database links.

Schema version

The schema_version field in manifest.json identifies which version of the specification this container conforms to. Schema Version 2 is the current version, introducing the directory bundle format. Future versions will include migration paths.