Adapters
Adapters are the ingress layer of OmniData. Each adapter knows how to talk to one source — a filesystem directory, an email account, a messaging app, a web browser — and translate what it finds into OmniData resources.
The contract
Every adapter implements two methods:
sync()— Discover new or changed items from the source. For each item, create or update a resource record inomnidata_resources(insideindex.db) and optionally store raw content in theblobs/directory. Returns aSyncResult.read_content(resource)— Given a resource, extract its text content for chunking. This is called during pipeline promotion from bronze to silver.
The separation matters. sync() handles discovery and metadata. read_content() handles extraction. They run at different times and may be called by different workers.
Three adapter layers
Adapters are discovered and loaded in priority order:
1. Builtin adapters
Ship with the OmniData runtime. Cover universal sources: filesystem, clipboard, manual input. Always available, no installation required.
2. Entry point adapters
Installed as Python packages that register via the omnidata.adapters entry point group. Discovered automatically at runtime. This is the standard distribution mechanism for third-party adapters — publish to PyPI, install with pip, and the runtime finds them.
3. Local adapters
Python files placed in a designated local directory (typically ~/.config/omnidata/adapters/). Useful for personal or experimental adapters that don’t warrant a package. Loaded last, can override entry point adapters by name.
Idempotency patterns
Adapters must be safe to run repeatedly. OmniData supports three idempotency strategies:
- Watermark — The adapter stores a high-water mark (timestamp, cursor, page token) in its registry state. Each sync resumes from where it left off. Best for APIs with chronological ordering.
- Content-hash — The adapter computes a SHA-256 hash of the content. If a resource with that
content_hashalready exists, it skips. Best for content that may be re-encountered (files, web pages). - Existence — The adapter checks whether a resource with a given URI already exists. Simplest strategy, suitable when URIs are stable and unique.
These patterns can be combined. A filesystem adapter might use watermark (modified-time) for discovery and content-hash for deduplication.
SyncResult
Every sync() call returns a SyncResult containing:
| Field | Type | Description |
|---|---|---|
created |
int | Number of new resources created |
updated |
int | Number of existing resources updated |
skipped |
int | Number of items skipped (already current) |
errors |
list | Any errors encountered, with context |
watermark |
any | Updated watermark value to persist in adapter state |
The runtime uses SyncResult to update the adapter registry and report sync health.
Registry
Each adapter’s configuration and state are stored in adapters.json at the root of the .omnidata bundle. This means the container is self-describing — you can inspect any .omnidata bundle and know exactly which adapters feed it, when they last ran, and what their current sync state is.
{
"adapters": [
{
"id": "uuid-here",
"adapter_name": "notes",
"uri_scheme": "notes",
"enabled": true,
"sync_interval": 1800,
"configuration": {"watch_dir": "~/Notes"},
"state": {"last_sync_at": "2026-03-28T12:00:00Z"}
}
]
}
Because adapters.json is a plain JSON file (not a SQL table), it can be read and edited with any tool — no SQLite required.