Raw SQL
Every .omnidata container is a directory bundle containing standard SQLite databases. You do not need any SDK to work with them. Open the appropriate .db file with the sqlite3 CLI, a database browser, or any SQLite library in any language. This page covers the essential setup and common queries.
Container structure
instance.omnidata/
index.db — resources, chunks, embeddings, FTS5, deltas, queue, kv
memory.db — collections, edges, tags, memory records
blobs/ — content-addressed files on the filesystem
manifest.json — container identity and metadata
adapters.json — adapter registry and state
Opening and configuring
sqlite3 CLI
# Open index.db for content search, resources, chunks
sqlite3 my-instance.omnidata/index.db
# Open memory.db for collections, memory, edges
sqlite3 my-instance.omnidata/memory.db
Once inside, set the required PRAGMAs:
PRAGMA journal_mode = WAL;
PRAGMA foreign_keys = ON;
PRAGMA cache_size = -64000;
These PRAGMAs are not stored persistently (except journal_mode, which is). Set foreign_keys and cache_size every time you open a database.
Convenience: .sqliterc
Add to ~/.sqliterc so PRAGMAs are set automatically:
.mode column
.headers on
PRAGMA foreign_keys = ON;
PRAGMA cache_size = -64000;
Exploring the schema
-- List all tables (works in either database)
SELECT name FROM sqlite_master WHERE type='table' ORDER BY name;
-- Describe a table
PRAGMA table_info(omnidata_resources);
To inspect the manifest or adapter registry, read the JSON files directly:
cat my-instance.omnidata/manifest.json | python3 -m json.tool
cat my-instance.omnidata/adapters.json | python3 -m json.tool
Common queries (index.db)
List recent resources
-- index.db
SELECT uri, title, source, pipeline_state, resource_at
FROM omnidata_resources
WHERE deleted_at IS NULL
ORDER BY resource_at DESC
LIMIT 20;
Count resources by source
-- index.db
SELECT source, COUNT(*) as count
FROM omnidata_resources
WHERE deleted_at IS NULL
GROUP BY source
ORDER BY count DESC;
Pipeline state summary
-- index.db
SELECT pipeline_state, COUNT(*) as count
FROM omnidata_resources
WHERE deleted_at IS NULL
GROUP BY pipeline_state;
Full-text search
-- index.db
SELECT c.content, r.title, r.uri
FROM fts_chunks fts
JOIN omnidata_chunks c ON c.rowid = fts.rowid
JOIN omnidata_resources r ON r.id = c.resource_id
WHERE fts_chunks MATCH 'search query here'
AND c.deleted_at IS NULL
AND r.deleted_at IS NULL
ORDER BY rank
LIMIT 10;
FTS5 syntax reference
-- Exact phrase
WHERE fts_chunks MATCH '"exact phrase"'
-- Boolean AND
WHERE fts_chunks MATCH 'term1 AND term2'
-- Boolean OR
WHERE fts_chunks MATCH 'term1 OR term2'
-- NOT
WHERE fts_chunks MATCH 'term1 NOT term2'
-- Proximity (within 5 tokens)
WHERE fts_chunks MATCH 'NEAR(term1 term2, 5)'
-- Column filter (if FTS table has multiple columns)
WHERE fts_chunks MATCH 'content:search terms'
View adapter configuration
Adapter configuration lives in adapters.json, not in a SQL table:
cat my-instance.omnidata/adapters.json | python3 -m json.tool
Find resources with blobs
Blobs are stored as content-addressed files in the blobs/ directory. To find resources that have blob content:
-- index.db
SELECT uri, title, mime_type, byte_size, content_hash
FROM omnidata_resources
WHERE content_hash IS NOT NULL
AND deleted_at IS NULL
ORDER BY byte_size DESC
LIMIT 20;
Then read the blob from the filesystem:
# Blobs are stored at blobs/<first-2-chars>/<full-hash>
cat my-instance.omnidata/blobs/ab/abcdef1234...
Common queries (memory.db)
Browse collections
-- memory.db: Root collections
SELECT id, name, description
FROM omnidata_collections
WHERE parent_id IS NULL AND deleted_at IS NULL;
-- memory.db: Edges from a collection (target_id references resources in index.db)
SELECT target_id, edge_type
FROM omnidata_edges
WHERE source_id = '<collection-id>'
AND edge_type = 'contains'
AND deleted_at IS NULL;
Search memory
-- memory.db
SELECT * FROM omnidata_memory
WHERE memory_type = 'preference'
AND deleted_at IS NULL;
What changed since yesterday?
-- index.db
SELECT uri, title, source, created_at
FROM omnidata_resources
WHERE created_at > datetime('now', '-1 day')
AND deleted_at IS NULL
ORDER BY created_at DESC;
Write operations
When writing to databases inside an .omnidata container via raw SQL, follow these conventions:
- Generate UUID v4 for all
idcolumns - Use ISO 8601 UTC for all timestamps:
strftime('%Y-%m-%dT%H:%M:%SZ', 'now') - Never hard-delete: use
UPDATE ... SET deleted_at = strftime('%Y-%m-%dT%H:%M:%SZ', 'now') - Always include
deleted_at IS NULLin SELECT queries - Set
pipeline_state = 'bronze'for new resources — let the runtime handle promotion - Write resources and chunks to
index.db, collections and memory tomemory.db - Store blob content in
blobs/as content-addressed files, not in SQLite
-- index.db: Soft-delete a resource
UPDATE omnidata_resources
SET deleted_at = strftime('%Y-%m-%dT%H:%M:%SZ', 'now')
WHERE uri = 'file:///path/to/removed-file.txt';