Documentation Specification SDKs

Adapters

Adapters are the ingress layer of OmniData. Each adapter knows how to talk to one source — a filesystem directory, an email account, a messaging app, a web browser — and translate what it finds into OmniData resources.

The contract

Every adapter implements two methods:

  • sync() — Discover new or changed items from the source. For each item, create or update a resource record in omnidata_resources (inside index.db) and optionally store raw content in the blobs/ directory. Returns a SyncResult.
  • read_content(resource) — Given a resource, extract its text content for chunking. This is called during pipeline promotion from bronze to silver.

The separation matters. sync() handles discovery and metadata. read_content() handles extraction. They run at different times and may be called by different workers.

Three adapter layers

Adapters are discovered and loaded in priority order:

1. Builtin adapters

Ship with the OmniData runtime. Cover universal sources: filesystem, clipboard, manual input. Always available, no installation required.

2. Entry point adapters

Installed as Python packages that register via the omnidata.adapters entry point group. Discovered automatically at runtime. This is the standard distribution mechanism for third-party adapters — publish to PyPI, install with pip, and the runtime finds them.

3. Local adapters

Python files placed in a designated local directory (typically ~/.config/omnidata/adapters/). Useful for personal or experimental adapters that don’t warrant a package. Loaded last, can override entry point adapters by name.

Idempotency patterns

Adapters must be safe to run repeatedly. OmniData supports three idempotency strategies:

  • Watermark — The adapter stores a high-water mark (timestamp, cursor, page token) in its registry state. Each sync resumes from where it left off. Best for APIs with chronological ordering.
  • Content-hash — The adapter computes a SHA-256 hash of the content. If a resource with that content_hash already exists, it skips. Best for content that may be re-encountered (files, web pages).
  • Existence — The adapter checks whether a resource with a given URI already exists. Simplest strategy, suitable when URIs are stable and unique.

These patterns can be combined. A filesystem adapter might use watermark (modified-time) for discovery and content-hash for deduplication.

SyncResult

Every sync() call returns a SyncResult containing:

Field Type Description
created int Number of new resources created
updated int Number of existing resources updated
skipped int Number of items skipped (already current)
errors list Any errors encountered, with context
watermark any Updated watermark value to persist in adapter state

The runtime uses SyncResult to update the adapter registry and report sync health.

Registry

Each adapter’s configuration and state are stored in adapters.json at the root of the .omnidata bundle. This means the container is self-describing — you can inspect any .omnidata bundle and know exactly which adapters feed it, when they last ran, and what their current sync state is.

{
  "adapters": [
    {
      "id": "uuid-here",
      "adapter_name": "notes",
      "uri_scheme": "notes",
      "enabled": true,
      "sync_interval": 1800,
      "configuration": {"watch_dir": "~/Notes"},
      "state": {"last_sync_at": "2026-03-28T12:00:00Z"}
    }
  ]
}

Because adapters.json is a plain JSON file (not a SQL table), it can be read and edited with any tool — no SQLite required.