Documentation Specification SDKs

Architecture

OmniData is a layered system. The directory bundle sits at the bottom. Everything else builds on top.

┌─────────────────────────────────────────────────┐
│                   Surfaces                       │
│  Chrome Extension  │  CLI  │  MCP Server  │  App │
├─────────────────────────────────────────────────┤
│                    Runtime                       │
│   Adapters  │  Chunker  │  Embedder  │  Search  │
├─────────────────────────────────────────────────┤
│               Directory Bundle                   │
│            .omnidata/ (v0.2.0)                   │
│                                                  │
│  manifest.json        identity + schema version  │
│  index.db             resources, chunks, FTS5,   │
│                       embeddings, queue, kv      │
│  memory.db            collections, edges, tags,  │
│                       memory records             │
│  adapters.json        adapter registry + state   │
│  ingress.log          append-only ingest log     │
│  blobs/               content-addressed files    │
│                       (SHA-256 fanout dirs)      │
└─────────────────────────────────────────────────┘

Three layers

1. Directory bundle (the contract)

The .omnidata directory is a bundle containing multiple files, each with a focused responsibility:

File Purpose
manifest.json Identity: who owns this container, what hat it belongs to, schema version
index.db Search layer: resources, chunks with embeddings, FTS5 indexes, queue, key-value store
memory.db Graph layer: collections, edges, tags, memory records with emotional salience
blobs/ Content-addressed filesystem storage with SHA-256 fanout
adapters.json What adapters feed this instance, their state and configuration
ingress.log Append-only log of every ingest operation for debugging and replay

The two databases have distinct concerns. index.db handles everything needed to find content: resource metadata, text chunks, vector embeddings, and full-text search indexes. memory.db handles everything needed to organize and relate content: hierarchies, graphs, and structured knowledge.

2. Runtime (the engine)

The runtime is a library that operates on .omnidata bundles:

  • Adapter orchestration: Scheduling syncs, managing watermarks, enqueuing new content
  • Text chunking: Splitting documents into segments (sliding window for text, tree-sitter for code)
  • Embedding: Generating vector representations via models like Nomic embed-text
  • RRF search: Combining vector similarity and full-text search with Reciprocal Rank Fusion
  • Blob management: Content-addressed filesystem storage with SHA-256 naming and fanout directories
  • Pipeline promotion: Moving resources from bronze (raw) to silver (chunked) to gold (searchable)

3. Surfaces (the interfaces)

Surfaces are applications that use the runtime:

  • CLI: omnidata init, omnidata search, omnidata ingest
  • MCP server: Exposes OmniData as tools for AI agents
  • Chrome extension: Browser capture into the active .omnidata instance
  • Desktop app: Native macOS/Windows application
  • API server: Thin HTTP layer for consumers that can’t access the filesystem

Data flow

Source (browser, voice, filesystem, messages)
    │
    ▼
Adapter (discovers new/changed items)
    │
    ▼
Queue (index.db — pending work items)
    │
    ▼
Worker (dequeues, reads content, chunks, embeds)
    │
    ├──▶ blobs/          (raw binary → filesystem, SHA-256 named)
    ├──▶ index.db        (resource metadata, text chunks + embeddings, FTS5)
    ├──▶ memory.db       (collections, edges, relationships)
    └──▶ ingress.log     (append-only operation record)
    │
    ▼
Search (RRF: vector + FTS5 → fused results)

Binary content flows to the filesystem as content-addressed blobs. Metadata and searchable text flow to index.db. Organizational structure and relationships flow to memory.db. Every operation is recorded in ingress.log.

Instance isolation

Each .omnidata bundle is fully independent. A person may have many:

~/.local/share/eidosomni/instances/
├── director-of-ai.omnidata/
│   ├── manifest.json
│   ├── index.db
│   ├── memory.db
│   ├── blobs/
│   ├── adapters.json
│   └── ingress.log
├── eidos.omnidata/
├── greenmark.omnidata/
├── health.omnidata/
├── financial-manager.omnidata/
└── builder.omnidata/

Each instance has its own adapters, search index, blob store, and content. Moving to a new machine means copying the directories. Because .omnidata is a bundle, Finder (macOS) treats each one as a single item – drag, drop, zip, or back up as one unit.

Federation (future)

A future Meta-Omni layer will enable searching across multiple .omnidata instances. Each bundle must work alone first. Federation is additive, never required.