Documentation Specification SDKs

Resources

The omnidata_resources table is the central registry of an OmniData container. It lives in index.db inside the .omnidata bundle. Every piece of content — a web page, a file, an email, a voice recording, a screenshot — gets exactly one row in this table, identified by its URI.

Schema

-- index.db
CREATE TABLE omnidata_resources (
    id              TEXT PRIMARY KEY,
    uri             TEXT NOT NULL UNIQUE,
    source          TEXT NOT NULL,
    resource_type   TEXT NOT NULL,
    title           TEXT,
    content_hash    TEXT,
    byte_size       INTEGER,
    mime_type       TEXT,
    resource_at     TEXT,
    pipeline_state  TEXT NOT NULL DEFAULT 'bronze',
    metadata        TEXT DEFAULT '{}',
    created_at      TEXT NOT NULL DEFAULT (strftime('%Y-%m-%dT%H:%M:%SZ', 'now')),
    updated_at      TEXT NOT NULL DEFAULT (strftime('%Y-%m-%dT%H:%M:%SZ', 'now')),
    deleted_at      TEXT
);

Columns

id

Text (UUID v4). Primary key. Generated at creation, never reused.

uri

Text. The unique identifier for this resource in its source system. Uses source-specific URI schemes: file:///path/to/doc.pdf, tosh://+15551234567/msg-uuid, chrome-capture://2026-03-28T10:00:00Z. The URI is the natural key — if two adapters produce the same URI, they refer to the same resource.

source

Text. The name of the adapter that created this resource (e.g., "filesystem", "tosh", "chrome-capture"). Used to route read_content() calls back to the correct adapter during pipeline promotion.

resource_type

Text. A coarse classification: "document", "message", "image", "audio", "video", "webpage", "note", "code". Implementations may extend this set.

title

Text, nullable. A human-readable title for display. May be the filename, email subject, message preview, or page title.

content_hash

Text, nullable. SHA-256 hash of the resource’s raw content. Used to locate the corresponding file in the blobs/ directory (content-addressed filesystem storage). NULL if no raw content has been stored (metadata-only resources).

byte_size

Integer, nullable. Size of the raw content in bytes. Used for storage accounting and display.

mime_type

Text, nullable. IANA media type of the raw content (e.g., "application/pdf", "image/png", "text/plain").

resource_at

Text, nullable (ISO 8601 UTC). The source’s own timestamp for this content — when the message was sent, when the file was last modified, when the page was captured. Distinct from created_at, which records when OmniData ingested it.

pipeline_state

Text. One of "bronze", "silver", or "gold". Tracks how far this resource has been processed. See Pipeline States for details.

metadata

Text (JSON object). Adapter-specific data that doesn’t fit the core columns. Examples: email headers, message thread IDs, file permissions, capture context.

created_at / updated_at / deleted_at

Text (ISO 8601 UTC). Standard lifecycle timestamps. deleted_at is NULL for active records; set to a timestamp for soft-deleted records.

Indexes

The following indexes are created by the bootstrap SQL in index.db:

CREATE INDEX idx_resources_uri ON omnidata_resources(uri);
CREATE INDEX idx_resources_source ON omnidata_resources(source);
CREATE INDEX idx_resources_pipeline ON omnidata_resources(pipeline_state);
CREATE INDEX idx_resources_content_hash ON omnidata_resources(content_hash);
CREATE INDEX idx_resources_deleted ON omnidata_resources(deleted_at);

One row per URI

The uniqueness constraint on uri is foundational. An adapter that encounters the same item twice should update the existing row, not create a duplicate. The URI is the join point between the external world and the OmniData registry.