LlamaIndex applications often retrieve more context than the answer needs. That is useful for recall, but it means repeated headers, neighboring chunks, old memory, and low-value tool notes can reach every expensive model call.
Adola gives you a pre-model compression hop. Send the user query plus the context you would normally pass to the response synthesizer. Rose 1 returns plain text plus a receipt with token savings and latency.
Where to insert it
Retriever output
Compress the final joined node text after retrieval and reranking, before synthesis.
Chat engines
Reduce retrieved memory, tool notes, and prior context before the response model call.
Query engines
Keep your index unchanged and add compression only at the prompt assembly boundary.
Minimal server-side pattern
Keep the Adola API key on your server. The pattern works whether your LlamaIndex app uses a query engine, chat engine, or custom retriever pipeline.
async function answerWithCompressedLlamaIndex({ query, nodes, llm }) {
const context = nodes
.map((node, index) => `[node ${index + 1}] ${node.text}`)
.join("\n\n");
const compressed = await fetch("https://api.adola.app/v1/compress", {
method: "POST",
headers: {
"content-type": "application/json",
authorization: `Bearer ${process.env.ADOLA_API_KEY}`
},
body: JSON.stringify({
model: "rose-1",
query,
input: context,
compression: { target_ratio: 0.35, preserve_order: true }
})
}).then((response) => response.json());
const answer = await llm.complete({
prompt: `Question: ${query}\n\nContext:\n${compressed.output}`
});
return { answer, compressionReceipt: compressed.receipt };
}Try one retrieved context block
The capped demo endpoint does not need a key. Use it to test one real set of retrieved nodes before creating a workspace.
curl -s https://api.adola.app/v1/demo/compress \
-H 'content-type: application/json' \
--data '{
"model": "rose-1",
"query": "Which retrieved node answers the user question?",
"input": "Long LlamaIndex retrieved node text, memory, or tool context...",
"compression": { "target_ratio": 0.35, "preserve_order": true }
}'