Context compression API for LLM apps

Where the API sits

Keep retrieval, reranking, and tool execution unchanged. Send Adola the final context block and the user query immediately before the model call. The model receives the compressed output; your logs keep the receipt.

RetrieveAssemble contextCompressCall modelLog receipt

Try the public endpoint

The demo endpoint is capped but uses the same request shape as production. It is useful for testing one real RAG prompt or agent trace before creating a key.

curl -s https://api.adola.app/v1/demo/compress \
  -H 'content-type: application/json' \
  --data '{
    "model": "rose-1",
    "query": "What should the assistant answer?",
    "input": "Long retrieved context, ticket history, or agent trace...",
    "compression": {
      "target_ratio": 0.35,
      "preserve_order": true
    }
  }'

API surfaces

Demo

POST https://api.adola.app/v1/demo/compress

No key, capped input, good for one prompt.

Production

POST https://api.adola.app/v1/compress

Bearer key, receipts, and project usage logs.

Batch

POST https://api.adola.app/v1/batch/compress

Compress many prompts for offline evaluation.

Good fits

RAG systems that retrieve more context than the final model should read.
Agent loops that accumulate tool traces, observations, and prior planning turns.
Support copilots that merge ticket history, policy docs, account notes, and drafts.
Model gateways that need smaller prompts without changing model providers.

Receipt fields to monitor

Store original tokens, output tokens, tokens saved, compression ratio, latency, and risk flags next to the downstream model request. That makes compression behavior auditable when an answer is weak or a source should have been preserved.