DeepSeek apps often keep answer quality high by sending broad context: retrieval chunks, old chat turns, tool logs, support history, and fallback-provider wrappers. That context can be useful, but it also pushes up latency and token spend when much of it is repeated or weakly related to the current query.
Adola compresses the final context block before the model call. Rose 1 returns plain compressed text plus a receipt with original tokens, output tokens, saved tokens, compression ratio, latency, and risk flags.
Best fits
DeepSeek RAG
Compress retrieved passages before passing them into a DeepSeek chat or reasoning call.
Agent traces
Reduce previous steps, tool output, and scratch context before the next model hop.
Model routers
Use the same compression hop before DeepSeek, OpenAI, Anthropic, or a fallback provider.
Minimal server-side pattern
Keep provider keys on your server. Send the full context to Adola first, then pass the compressed output into the DeepSeek-compatible chat request.
export async function answerWithCompressedDeepSeek({ question, context }) {
const compressed = await fetch("https://api.adola.app/v1/compress", {
method: "POST",
headers: {
"content-type": "application/json",
authorization: `Bearer ${process.env.ADOLA_API_KEY}`
},
body: JSON.stringify({
model: "rose-1",
query: question,
input: context,
compression: { target_ratio: 0.35, preserve_order: true }
})
}).then((response) => response.json());
const response = await fetch("https://api.deepseek.com/chat/completions", {
method: "POST",
headers: {
"content-type": "application/json",
authorization: `Bearer ${process.env.DEEPSEEK_API_KEY}`
},
body: JSON.stringify({
model: "deepseek-chat",
messages: [
{
role: "system",
content: "Answer from the compressed context. Say when context is insufficient."
},
{
role: "user",
content: `Question: ${question}\n\nCompressed context:\n${compressed.output}`
}
]
})
}).then((res) => res.json());
return { response, compressionReceipt: compressed.receipt };
}Try the compression hop first
The public demo endpoint is capped and does not require a key. Run one real prompt through Rose 1 before changing your DeepSeek request path.
curl -s https://api.adola.app/v1/demo/compress \
-H 'content-type: application/json' \
--data '{
"model": "rose-1",
"query": "What should DeepSeek answer?",
"input": "Long retrieved context, tool trace, support ticket, or policy text...",
"compression": { "target_ratio": 0.35, "preserve_order": true }
}'