Where the API sits
Keep retrieval, reranking, and tool execution unchanged. Send Adola the final context block and the user query immediately before the model call. The model receives the compressed output; your logs keep the receipt.
Try the public endpoint
The demo endpoint is capped but uses the same request shape as production. It is useful for testing one real RAG prompt or agent trace before creating a key.
curl -s https://api.adola.app/v1/demo/compress \
-H 'content-type: application/json' \
--data '{
"model": "rose-1",
"query": "What should the assistant answer?",
"input": "Long retrieved context, ticket history, or agent trace...",
"compression": {
"target_ratio": 0.35,
"preserve_order": true
}
}'API surfaces
POST https://api.adola.app/v1/demo/compress
No key, capped input, good for one prompt.
POST https://api.adola.app/v1/compress
Bearer key, receipts, and project usage logs.
POST https://api.adola.app/v1/batch/compress
Compress many prompts for offline evaluation.
Good fits
- RAG systems that retrieve more context than the final model should read.
- Agent loops that accumulate tool traces, observations, and prior planning turns.
- Support copilots that merge ticket history, policy docs, account notes, and drafts.
- Model gateways that need smaller prompts without changing model providers.
Receipt fields to monitor
Store original tokens, output tokens, tokens saved, compression ratio, latency, and risk flags next to the downstream model request. That makes compression behavior auditable when an answer is weak or a source should have been preserved.