The failure mode is familiar: every successful tool call creates more text for the next step. A generous agent trace can help the model recover from mistakes, but it also pushes cost and latency upward with repeated, stale, or low-value context.

Put prompt compression between trace assembly and the downstream model call. The agent still records the full trace internally, while the model receives a smaller context block for the next step.

Run toolsAssemble traceCompressCall modelLog receipt

Good candidates

Compression is most useful when the next step needs a broad memory of the run, but not every token from every intermediate artifact.

  • Tool call results
  • Retrieved documents
  • Prior turns
  • Policy snippets
  • Planner notes
  • Error logs

Safe integration pattern

Keep system instructions, policy, and must-cite facts outside the part you reduce. Use Rose 1 on the bulky trace context that would otherwise go straight into the next model request.

  • Keep protected instructions and safety policy intact.
  • Compress only context that would otherwise go to the next model call.
  • Log the compression receipt beside the agent step.
  • Compare final answer quality against the full-trace baseline.
const compressedTrace = await adola.compress({
  query: nextAgentStep,
  input: assembledToolTrace,
  compression: {
    target_ratio: 0.35,
    preserve_order: true
  }
});

const nextStep = await model.responses.create({
  model: "your-agent-model",
  input: compressedTrace.output
});