Documentation Specification SDKs

Search Patterns

OmniData containers are directory bundles containing SQLite databases. You can search them with raw SQL. This page covers the most common patterns and which database to target.

Which database?

Database Contains Use for
index.db resources, chunks, embeddings, FTS5, deltas, queue, kv Content search, FTS5, vector similarity, pipeline queries
memory.db collections, edges, tags, memory records Memory queries, collection browsing, graph traversal

Full-text search (FTS5)

FTS5 uses BM25 ranking for keyword queries. The fts_chunks virtual table in index.db indexes all chunk content.

-- index.db: Simple keyword search
SELECT c.id, c.content, r.title, r.uri
FROM fts_chunks fts
JOIN omnidata_chunks c ON c.rowid = fts.rowid
JOIN omnidata_resources r ON r.id = c.resource_id
WHERE fts_chunks MATCH 'machine learning'
  AND c.deleted_at IS NULL
  AND r.deleted_at IS NULL
ORDER BY rank
LIMIT 20;

FTS5 supports operators: AND, OR, NOT, NEAR, and phrase matching with quotes.

-- Phrase match
WHERE fts_chunks MATCH '"reciprocal rank fusion"'

-- Boolean operators
WHERE fts_chunks MATCH 'vector AND search NOT image'

-- Proximity: terms within 10 tokens of each other
WHERE fts_chunks MATCH 'NEAR(adapter sync, 10)'

Vector similarity search

Vector search requires computing cosine similarity between a query embedding and stored embeddings in index.db. Since SQLite does not have a built-in vector distance function, this is typically done in application code.

import struct
import math

def cosine_similarity(a: bytes, b: bytes) -> float:
    n = len(a) // 4
    va = struct.unpack(f"<{n}f", a)
    vb = struct.unpack(f"<{n}f", b)
    dot = sum(x * y for x, y in zip(va, vb))
    mag_a = math.sqrt(sum(x * x for x in va))
    mag_b = math.sqrt(sum(x * x for x in vb))
    if mag_a == 0 or mag_b == 0:
        return 0.0
    return dot / (mag_a * mag_b)

# Open index.db, embed the query, then score all gold chunks
query_embedding = embed("what is OmniData?")
rows = cursor.execute("""
    SELECT c.id, c.content, c.embedding, r.title, r.uri
    FROM omnidata_chunks c
    JOIN omnidata_resources r ON r.id = c.resource_id
    WHERE c.embedding IS NOT NULL
      AND c.deleted_at IS NULL AND r.deleted_at IS NULL
""").fetchall()

scored = [
    (cosine_similarity(query_embedding, row[2]), row)
    for row in rows
]
scored.sort(key=lambda x: x[0], reverse=True)
top_results = scored[:20]

Reciprocal Rank Fusion (RRF)

Combine FTS5 and vector results (both from index.db) into a single ranked list:

def rrf_fuse(vector_results, fts_results, k=60, vector_weight=0.6, fts_weight=0.4):
    scores = {}

    for rank, (chunk_id, *_) in enumerate(vector_results, start=1):
        scores[chunk_id] = scores.get(chunk_id, 0) + vector_weight / (k + rank)

    for rank, (chunk_id, *_) in enumerate(fts_results, start=1):
        scores[chunk_id] = scores.get(chunk_id, 0) + fts_weight / (k + rank)

    return sorted(scores.items(), key=lambda x: x[1], reverse=True)

Delta search: “What changed?”

Query resources that were created or updated since a given timestamp (from index.db):

-- index.db: New resources since last check
SELECT uri, title, source, pipeline_state, created_at
FROM omnidata_resources
WHERE created_at > '2026-03-28T00:00:00Z'
  AND deleted_at IS NULL
ORDER BY created_at DESC;

-- index.db: Updated resources (content changed)
SELECT uri, title, updated_at
FROM omnidata_resources
WHERE updated_at > '2026-03-28T00:00:00Z'
  AND created_at < '2026-03-28T00:00:00Z'
  AND deleted_at IS NULL;

For more granular change tracking, query omnidata_deltas in index.db:

-- index.db
SELECT d.*, r.title, r.uri
FROM omnidata_deltas d
JOIN omnidata_resources r ON r.id = d.resource_id
WHERE d.created_at > '2026-03-28T00:00:00Z'
ORDER BY d.created_at DESC;

Memory queries

Search structured knowledge in memory.db:

-- memory.db: Find all preferences
SELECT * FROM omnidata_memory
WHERE memory_type = 'preference'
  AND deleted_at IS NULL;

-- memory.db: Search memory content
SELECT * FROM omnidata_memory
WHERE content LIKE '%coffee%'
  AND deleted_at IS NULL;

Collection queries

Browse collections and their contents in memory.db:

-- memory.db: Root collections
SELECT id, name, description
FROM omnidata_collections
WHERE parent_id IS NULL AND deleted_at IS NULL;

-- memory.db: Resources linked to a collection (edges reference resource IDs from index.db)
SELECT e.target_id, e.edge_type
FROM omnidata_edges e
WHERE e.source_id = '<collection-id>'
  AND e.edge_type = 'contains'
  AND e.deleted_at IS NULL;

Scoping by source

Filter results to a specific adapter’s content (from index.db):

-- index.db
SELECT c.content, r.title
FROM omnidata_chunks c
JOIN omnidata_resources r ON r.id = c.resource_id
WHERE r.source = 'chrome-capture'
  AND c.deleted_at IS NULL AND r.deleted_at IS NULL;