The cost problem is usually not one giant prompt. It is the same run growing a little every step. A search result, test log, browser dump, and debugging trail may all be useful once, then keep getting carried forward after their value has faded.
Adola fits between agent context assembly and the downstream model call. Keep the full trace in your own system for audit and recovery, then send the model a smaller context block for the next step.
What usually bloats a coding-agent run
Compression works best on generous context that helps the agent orient itself but does not need to be copied verbatim into every following request.
- Repeated tool output
- Long error logs
- Old screenshots or page dumps
- Prior plans that no longer affect the next step
- Retrieved docs copied across several turns
- Verbose status trails from earlier attempts
Safe rollout pattern
Treat compression as a controlled hop before the model, not as a replacement for your source trace. The model receives less text, while your product keeps the raw evidence and a receipt for what changed.
- Keep system instructions, policies, and current user intent outside the compressed block.
- Compress the bulky run history before the next expensive model call.
- Log the receipt beside the agent step so saved tokens and risk flags are visible.
- Compare the answer against a full-context baseline before turning it on broadly.
const compressedRun = await adola.compress({
query: nextStep,
input: toolTraceAndPriorContext,
compression: {
target_ratio: 0.35,
preserve_order: true
}
});
const response = await model.responses.create({
model: "your-coding-agent-model",
input: compressedRun.output
});