Reduce Claude Code token costs in long agent runs

The cost problem is usually not one giant prompt. It is the same run growing a little every step. A search result, test log, browser dump, and debugging trail may all be useful once, then keep getting carried forward after their value has faded.

Adola fits between agent context assembly and the downstream model call. Keep the full trace in your own system for audit and recovery, then send the model a smaller context block for the next step.

Run toolsCollect traceCompressCall modelStore receipt

What usually bloats a coding-agent run

Compression works best on generous context that helps the agent orient itself but does not need to be copied verbatim into every following request.

Repeated tool output
Long error logs
Old screenshots or page dumps
Prior plans that no longer affect the next step
Retrieved docs copied across several turns
Verbose status trails from earlier attempts

Safe rollout pattern

Treat compression as a controlled hop before the model, not as a replacement for your source trace. The model receives less text, while your product keeps the raw evidence and a receipt for what changed.

Keep system instructions, policies, and current user intent outside the compressed block.
Compress the bulky run history before the next expensive model call.
Log the receipt beside the agent step so saved tokens and risk flags are visible.
Compare the answer against a full-context baseline before turning it on broadly.

const compressedRun = await adola.compress({
  query: nextStep,
  input: toolTraceAndPriorContext,
  compression: {
    target_ratio: 0.35,
    preserve_order: true
  }
});

const response = await model.responses.create({
  model: "your-coding-agent-model",
  input: compressedRun.output
});