Pipeline States
Every resource in an OmniData container moves through a three-stage pipeline: bronze, silver, and gold. This progression tracks how far a piece of content has been processed, from raw metadata to fully searchable knowledge.
The current state is stored in the pipeline_state column on the omnidata_resources table in index.db.
The three states
Bronze (raw ingested)
A resource enters the container in the bronze state. At this point, OmniData knows the resource exists — its URI, source adapter, title, MIME type, and timestamps are recorded — but the actual content has not yet been processed.
Bronze means: “We know about this item. We have its metadata. We may have its raw binary stored in the blobs/ directory. But it has not been chunked or embedded.”
A bronze resource is discoverable by metadata queries but invisible to semantic search.
Silver (chunked)
A silver resource has had its content extracted and split into text segments stored in omnidata_chunks (inside index.db). The chunking strategy depends on content type — sliding window for prose, tree-sitter for code, message boundaries for conversations.
Silver means: “The content has been broken into searchable text segments. Full-text search (FTS5) can find this resource. But vector embeddings have not yet been generated.”
A silver resource appears in FTS5 keyword searches but not in vector similarity queries.
Gold (embedded and searchable)
A gold resource is fully processed. Every chunk has a vector embedding stored as a little-endian float32 BLOB in index.db, with the embedding model recorded. The resource is now findable by both keyword search and semantic similarity.
Gold means: “This resource is fully indexed. RRF search will include it in results.”
Pipeline promotion
Resources move forward through the pipeline, never backward. The typical flow:
- Adapter sync creates a bronze resource (metadata + optional blob in
blobs/) - Chunker reads the content, splits it into segments, promotes to silver
- Embedder generates vectors for each chunk, promotes to gold
Each stage is idempotent. Re-running the chunker on an already-silver resource is a no-op. Re-running the embedder on an already-gold resource is a no-op.
Querying by state
You can query resources at any pipeline state. These queries run against index.db:
-- Find resources still waiting to be chunked
SELECT uri, title FROM omnidata_resources
WHERE pipeline_state = 'bronze' AND deleted_at IS NULL;
-- Count resources at each stage
SELECT pipeline_state, COUNT(*) FROM omnidata_resources
WHERE deleted_at IS NULL
GROUP BY pipeline_state;
Why three states?
The pipeline exists because chunking and embedding are expensive operations. By separating them, the system can:
- Ingest quickly — adapters write metadata and move on, without blocking on embedding
- Process in bulk — a background worker promotes resources in batches
- Resume after interruption — if the process crashes mid-embedding, bronze and silver resources are still safe and can be promoted on the next run