RAG prompt compression quickstart

Keep retrieval generous

Fetch and rerank the same candidate chunks you would normally send to the model.

Compress only the final context

Send the user query plus the selected context block to Adola with a target ratio.

Pass plain text downstream

Use the returned output as model context and log the receipt next to the model request.

Minimal server-side example

Keep the Adola API key on your server. The browser or client app should never see it. The downstream model only receives the compressed text.

import OpenAI from "openai";
import { Adola } from "adola";

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const adola = new Adola({ apiKey: process.env.ADOLA_API_KEY });

export async function answerWithCompressedRag(query, retrievedChunks) {
  const retrievedContext = retrievedChunks
    .map((chunk, index) => `[source ${index + 1}] ${chunk.text}`)
    .join("\n\n");

  const compressed = await adola.compress({
    model: "rose-1",
    query,
    input: retrievedContext,
    compression: { target_ratio: 0.3, preserve_order: true }
  });

  const response = await openai.responses.create({
    model: "gpt-4.1-mini",
    input: [
      {
        role: "system",
        content: "Answer from the compressed retrieval context. Say when context is insufficient."
      },
      {
        role: "user",
        content: `Question: ${query}\n\nContext:\n${compressed.output}`
      }
    ]
  });

  return {
    answer: response.output_text,
    compressionReceipt: compressed.receipt
  };
}

What to log

Log the model response, the original retrieval IDs, and the Adola receipt. The receipt gives you original tokens, output tokens, saved tokens, compression ratio, latency, and risk flags for debugging bad answers.

RetrieveRerankCompressAnswerLog receipt

When to try it

Your retriever returns partly useful chunks with repeated wrapper text.
Your agent sends long tool traces into the next planning call.
Your support copilot includes old ticket history and policy docs.
You need provider-neutral text output instead of a model-specific gateway.