LangChain apps often start with generous context: retrieved documents, tool results, memory, policies, previous messages, and intermediate reasoning traces. Sending all of that downstream is simple, but it burns input tokens and can bury the answer-bearing text.
Put Adola immediately before the model node. Send the user task plus the context block you were going to pass to the LLM. Rose 1 returns compressed text plus a receipt for debugging and cost accounting.
Where to insert it
Retrieval chains
Compress the joined documents after retrieval and reranking, before the answer model sees them.
LangGraph agents
Compress tool output, prior turns, and scratchpad state before the next planning or response node.
Model routers
Keep compression provider-neutral so the same reduced prompt can go to OpenAI, Anthropic, DeepSeek, or a local model.
Minimal pattern
Keep the Adola key on your server. The important part is not the framework wrapper; it is the placement: compress after context assembly and before the expensive model call.
async function compressBeforeModel({ question, documents, llm }) {
const context = documents
.map((doc, index) => `[doc ${index + 1}] ${doc.pageContent}`)
.join("\n\n");
const compressed = await fetch("https://api.adola.app/v1/compress", {
method: "POST",
headers: {
"content-type": "application/json",
authorization: `Bearer ${process.env.ADOLA_API_KEY}`
},
body: JSON.stringify({
model: "rose-1",
query: question,
input: context,
compression: { target_ratio: 0.35, preserve_order: true }
})
}).then((response) => response.json());
const answer = await llm.invoke([
["system", "Answer from the compressed context. Say when context is insufficient."],
["human", `Question: ${question}\n\nContext:\n${compressed.output}`]
]);
return { answer, compressionReceipt: compressed.receipt };
}Try the hop without a key
The public demo endpoint is capped, but it is enough to test one real retrieved context block or agent trace before creating a workspace.
curl -s https://api.adola.app/v1/demo/compress \
-H 'content-type: application/json' \
--data '{
"model": "rose-1",
"query": "What should the assistant answer?",
"input": "Long LangChain retrieval context, tool output, or graph state...",
"compression": { "target_ratio": 0.35, "preserve_order": true }
}'