OpenAI prompt compression before Responses API calls

Many OpenAI apps send more input context than the model needs: broad retrieval chunks, tool output, chat history, policy text, and duplicated wrapper content. Rose 1 reduces that context before the expensive model call.

Adola returns plain text, so the rest of the integration stays provider-neutral. The receipt gives original tokens, output tokens, tokens saved, compression ratio, latency, and risk flags for debugging and cost review.

Assemble contextCompressCall OpenAIReturn answerLog receipt

Best fits

RAG answers

Compress retrieved documents before placing them in the OpenAI input context.

Agent steps

Reduce tool traces and prior state before the next planning or response call.

Support replies

Shrink tickets, policies, account notes, and previous replies before drafting an answer.

Minimal server-side pattern

Keep both API keys on your server. The downstream OpenAI request receives only the compressed context, while your logs keep the full receipt for measurement.

import OpenAI from "openai";

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

export async function answerWithCompressedContext({ question, context }) {
  const compressed = await fetch("https://api.adola.app/v1/compress", {
    method: "POST",
    headers: {
      "content-type": "application/json",
      authorization: `Bearer ${process.env.ADOLA_API_KEY}`
    },
    body: JSON.stringify({
      model: "rose-1",
      query: question,
      input: context,
      compression: { target_ratio: 0.35, preserve_order: true }
    })
  }).then((response) => response.json());

  const response = await openai.responses.create({
    model: "gpt-4.1-mini",
    input: [
      {
        role: "system",
        content: "Answer from the compressed context. Say when context is insufficient."
      },
      {
        role: "user",
        content: `Question: ${question}\n\nContext:\n${compressed.output}`
      }
    ]
  });

  return {
    answer: response.output_text,
    compressionReceipt: compressed.receipt
  };
}

Try it without an Adola key

Use the capped demo endpoint with one real prompt before creating a workspace or changing your OpenAI code path.

curl -s https://api.adola.app/v1/demo/compress \
  -H 'content-type: application/json' \
  --data '{
    "model": "rose-1",
    "query": "What should the OpenAI call answer?",
    "input": "Long retrieved context, agent trace, support ticket, or policy text...",
    "compression": { "target_ratio": 0.35, "preserve_order": true }
  }'