Not a drop-in replacement
Adola does not try to be the same interface as LLMLingua. Put Rose 1 after retrieval, reranking, or agent trace assembly, then pass the compressed text to the downstream model you already use.
Why teams use Adola instead
API instead of local library wiring
Use a hosted compression endpoint before the model call instead of owning model weights, runtime setup, and scaling in your app.
Receipts for production debugging
Each response includes original tokens, output tokens, saved tokens, compression ratio, latency, and risk flags.
Provider-neutral output
Rose 1 returns plain text that can feed OpenAI, Claude, DeepSeek, local models, or a model router.
Try the same idea on one prompt
The demo endpoint is intentionally small and no-key. Use it to compare a compressed context block against the prompt you would normally send to the model.
curl -s https://api.adola.app/v1/demo/compress \
-H 'content-type: application/json' \
--data '{
"model": "rose-1",
"query": "What should the assistant answer?",
"input": "Long RAG context, tool trace, policy text, or ticket history...",
"compression": { "target_ratio": 0.35, "preserve_order": true }
}'Best fits
- RAG systems where retrieved chunks are useful but too verbose.
- Agent runs where tool observations and prior turns inflate the next prompt.
- Support copilots where policies, account notes, and ticket history repeat boilerplate.
- Teams that want a compression hop without maintaining a research compressor in production.
When to stay with a local compressor
If you need fully offline execution, custom research control over compression internals, or benchmark-only experimentation, a local compressor may be a better fit. Adola is aimed at production apps that want a simple API hop and auditable request receipts.