DeepSeek prompt compression before model calls

DeepSeek apps often keep answer quality high by sending broad context: retrieval chunks, old chat turns, tool logs, support history, and fallback-provider wrappers. That context can be useful, but it also pushes up latency and token spend when much of it is repeated or weakly related to the current query.

Adola compresses the final context block before the model call. Rose 1 returns plain compressed text plus a receipt with original tokens, output tokens, saved tokens, compression ratio, latency, and risk flags.

Assemble contextCompressCall DeepSeekReturn answerLog receipt

Best fits

DeepSeek RAG

Compress retrieved passages before passing them into a DeepSeek chat or reasoning call.

Agent traces

Reduce previous steps, tool output, and scratch context before the next model hop.

Model routers

Use the same compression hop before DeepSeek, OpenAI, Anthropic, or a fallback provider.

Minimal server-side pattern

Keep provider keys on your server. Send the full context to Adola first, then pass the compressed output into the DeepSeek-compatible chat request.

export async function answerWithCompressedDeepSeek({ question, context }) {
  const compressed = await fetch("https://api.adola.app/v1/compress", {
    method: "POST",
    headers: {
      "content-type": "application/json",
      authorization: `Bearer ${process.env.ADOLA_API_KEY}`
    },
    body: JSON.stringify({
      model: "rose-1",
      query: question,
      input: context,
      compression: { target_ratio: 0.35, preserve_order: true }
    })
  }).then((response) => response.json());

  const response = await fetch("https://api.deepseek.com/chat/completions", {
    method: "POST",
    headers: {
      "content-type": "application/json",
      authorization: `Bearer ${process.env.DEEPSEEK_API_KEY}`
    },
    body: JSON.stringify({
      model: "deepseek-chat",
      messages: [
        {
          role: "system",
          content: "Answer from the compressed context. Say when context is insufficient."
        },
        {
          role: "user",
          content: `Question: ${question}\n\nCompressed context:\n${compressed.output}`
        }
      ]
    })
  }).then((res) => res.json());

  return { response, compressionReceipt: compressed.receipt };
}

Try the compression hop first

The public demo endpoint is capped and does not require a key. Run one real prompt through Rose 1 before changing your DeepSeek request path.

curl -s https://api.adola.app/v1/demo/compress \
  -H 'content-type: application/json' \
  --data '{
    "model": "rose-1",
    "query": "What should DeepSeek answer?",
    "input": "Long retrieved context, tool trace, support ticket, or policy text...",
    "compression": { "target_ratio": 0.35, "preserve_order": true }
  }'