Compress agent traces before the next model call

The failure mode is familiar: every successful tool call creates more text for the next step. A generous agent trace can help the model recover from mistakes, but it also pushes cost and latency upward with repeated, stale, or low-value context.

Put prompt compression between trace assembly and the downstream model call. The agent still records the full trace internally, while the model receives a smaller context block for the next step.

Run toolsAssemble traceCompressCall modelLog receipt

Good candidates

Compression is most useful when the next step needs a broad memory of the run, but not every token from every intermediate artifact.

Tool call results
Retrieved documents
Prior turns
Policy snippets
Planner notes
Error logs

Safe integration pattern

Keep system instructions, policy, and must-cite facts outside the part you reduce. Use Rose 1 on the bulky trace context that would otherwise go straight into the next model request.

Keep protected instructions and safety policy intact.
Compress only context that would otherwise go to the next model call.
Log the compression receipt beside the agent step.
Compare final answer quality against the full-trace baseline.

const compressedTrace = await adola.compress({
  query: nextAgentStep,
  input: assembledToolTrace,
  compression: {
    target_ratio: 0.35,
    preserve_order: true
  }
});

const nextStep = await model.responses.create({
  model: "your-agent-model",
  input: compressedTrace.output
});