Documentation Specification SDKs

Raw SQL

Every .omnidata container is a directory bundle containing standard SQLite databases. You do not need any SDK to work with them. Open the appropriate .db file with the sqlite3 CLI, a database browser, or any SQLite library in any language. This page covers the essential setup and common queries.

Container structure

instance.omnidata/
  index.db        — resources, chunks, embeddings, FTS5, deltas, queue, kv
  memory.db       — collections, edges, tags, memory records
  blobs/          — content-addressed files on the filesystem
  manifest.json   — container identity and metadata
  adapters.json   — adapter registry and state

Opening and configuring

sqlite3 CLI

# Open index.db for content search, resources, chunks
sqlite3 my-instance.omnidata/index.db

# Open memory.db for collections, memory, edges
sqlite3 my-instance.omnidata/memory.db

Once inside, set the required PRAGMAs:

PRAGMA journal_mode = WAL;
PRAGMA foreign_keys = ON;
PRAGMA cache_size = -64000;

These PRAGMAs are not stored persistently (except journal_mode, which is). Set foreign_keys and cache_size every time you open a database.

Convenience: .sqliterc

Add to ~/.sqliterc so PRAGMAs are set automatically:

.mode column
.headers on
PRAGMA foreign_keys = ON;
PRAGMA cache_size = -64000;

Exploring the schema

-- List all tables (works in either database)
SELECT name FROM sqlite_master WHERE type='table' ORDER BY name;

-- Describe a table
PRAGMA table_info(omnidata_resources);

To inspect the manifest or adapter registry, read the JSON files directly:

cat my-instance.omnidata/manifest.json | python3 -m json.tool
cat my-instance.omnidata/adapters.json | python3 -m json.tool

Common queries (index.db)

List recent resources

-- index.db
SELECT uri, title, source, pipeline_state, resource_at
FROM omnidata_resources
WHERE deleted_at IS NULL
ORDER BY resource_at DESC
LIMIT 20;

Count resources by source

-- index.db
SELECT source, COUNT(*) as count
FROM omnidata_resources
WHERE deleted_at IS NULL
GROUP BY source
ORDER BY count DESC;

Pipeline state summary

-- index.db
SELECT pipeline_state, COUNT(*) as count
FROM omnidata_resources
WHERE deleted_at IS NULL
GROUP BY pipeline_state;

Full-text search

-- index.db
SELECT c.content, r.title, r.uri
FROM fts_chunks fts
JOIN omnidata_chunks c ON c.rowid = fts.rowid
JOIN omnidata_resources r ON r.id = c.resource_id
WHERE fts_chunks MATCH 'search query here'
  AND c.deleted_at IS NULL
  AND r.deleted_at IS NULL
ORDER BY rank
LIMIT 10;

FTS5 syntax reference

-- Exact phrase
WHERE fts_chunks MATCH '"exact phrase"'

-- Boolean AND
WHERE fts_chunks MATCH 'term1 AND term2'

-- Boolean OR
WHERE fts_chunks MATCH 'term1 OR term2'

-- NOT
WHERE fts_chunks MATCH 'term1 NOT term2'

-- Proximity (within 5 tokens)
WHERE fts_chunks MATCH 'NEAR(term1 term2, 5)'

-- Column filter (if FTS table has multiple columns)
WHERE fts_chunks MATCH 'content:search terms'

View adapter configuration

Adapter configuration lives in adapters.json, not in a SQL table:

cat my-instance.omnidata/adapters.json | python3 -m json.tool

Find resources with blobs

Blobs are stored as content-addressed files in the blobs/ directory. To find resources that have blob content:

-- index.db
SELECT uri, title, mime_type, byte_size, content_hash
FROM omnidata_resources
WHERE content_hash IS NOT NULL
  AND deleted_at IS NULL
ORDER BY byte_size DESC
LIMIT 20;

Then read the blob from the filesystem:

# Blobs are stored at blobs/<first-2-chars>/<full-hash>
cat my-instance.omnidata/blobs/ab/abcdef1234...

Common queries (memory.db)

Browse collections

-- memory.db: Root collections
SELECT id, name, description
FROM omnidata_collections
WHERE parent_id IS NULL AND deleted_at IS NULL;

-- memory.db: Edges from a collection (target_id references resources in index.db)
SELECT target_id, edge_type
FROM omnidata_edges
WHERE source_id = '<collection-id>'
  AND edge_type = 'contains'
  AND deleted_at IS NULL;

Search memory

-- memory.db
SELECT * FROM omnidata_memory
WHERE memory_type = 'preference'
  AND deleted_at IS NULL;

What changed since yesterday?

-- index.db
SELECT uri, title, source, created_at
FROM omnidata_resources
WHERE created_at > datetime('now', '-1 day')
  AND deleted_at IS NULL
ORDER BY created_at DESC;

Write operations

When writing to databases inside an .omnidata container via raw SQL, follow these conventions:

  • Generate UUID v4 for all id columns
  • Use ISO 8601 UTC for all timestamps: strftime('%Y-%m-%dT%H:%M:%SZ', 'now')
  • Never hard-delete: use UPDATE ... SET deleted_at = strftime('%Y-%m-%dT%H:%M:%SZ', 'now')
  • Always include deleted_at IS NULL in SELECT queries
  • Set pipeline_state = 'bronze' for new resources — let the runtime handle promotion
  • Write resources and chunks to index.db, collections and memory to memory.db
  • Store blob content in blobs/ as content-addressed files, not in SQLite
-- index.db: Soft-delete a resource
UPDATE omnidata_resources
SET deleted_at = strftime('%Y-%m-%dT%H:%M:%SZ', 'now')
WHERE uri = 'file:///path/to/removed-file.txt';