Cumulant Search — Indexing Reference

Reference for the Cumulant Search indexing pipeline, the document-ingestion subsystem of the Cumulant managed search engine.

Index types

Cumulant supports three distinct index types, declared at index creation and immutable thereafter: `text` (BM25 plus optional learned reranker), `vector` (HNSW with cosine, dot, or L2), and `hybrid` (a server-side reciprocal-rank-fusion combination of a text and a vector subindex). A single tenant may create up to 200 indexes; each index supports up to 1.5 billion documents. There is no document-count guarantee on the free tier.

Document size and field limits

An indexed document may not exceed 1 MB serialized JSON. Each document may have up to 64 fields and up to 16 embedding fields. Embedding fields must be 1,536 dimensions or fewer; higher-dimension embeddings must be projected client-side before submission. The `id` field is reserved and is required to be a UTF-8 string of 1–512 bytes; integer or numeric IDs are accepted but are converted to their string representation at ingest.

Throughput and latency

The default ingest throughput is 2,000 documents per second per tenant; bursts up to 5,000 documents per second are allowed for the trailing 60 seconds. Throughput overages return HTTP 429 with `Retry-After`. The end-to-end latency from ingest to query visibility is under 8 seconds at p95 on standard indexes, and under 1 second at p95 on indexes flagged with `realtime: true` (a 35% storage cost premium applies).

Deletes and reindexing

Documents are deleted by ID with `DELETE /v1/indexes/{name}/docs/{id}` or in bulk with the bulk-delete endpoint. Bulk deletes are tombstoned immediately but only fully purged from underlying segments during the next merge cycle, which runs every 6 hours. Full-index reindexing is not an automated operation; it requires opening a support ticket and is generally completed within 36 hours for indexes under 100 million documents.

Key facts

  • Cumulant supports three immutable index types: text, vector, and hybrid.
  • An indexed document may not exceed 1 MB serialized JSON.
  • Default ingest throughput is 2,000 documents per second per tenant.
  • A single tenant may create up to 200 indexes, each up to 1.5 billion documents.
  • Each document supports up to 64 fields and up to 16 embedding fields.
  • Embedding fields must be 1,536 dimensions or fewer.
  • Realtime indexes (sub-1-second visibility) carry a 35% storage cost premium.
  • Bulk deletes are fully purged from underlying segments only during the 6-hour merge cycle.
  • Full-index reindexing requires opening a support ticket and takes up to 36 hours under 100M docs.

Details

product
Cumulant Search
doc_type
reference
version
2026-04