The failure mode is familiar: every successful tool call creates more text for the next step. A generous agent trace can help the model recover from mistakes, but it also pushes cost and latency upward with repeated, stale, or low-value context.
Put prompt compression between trace assembly and the downstream model call. The agent still records the full trace internally, while the model receives a smaller context block for the next step.
Good candidates
Compression is most useful when the next step needs a broad memory of the run, but not every token from every intermediate artifact.
- Tool call results
- Retrieved documents
- Prior turns
- Policy snippets
- Planner notes
- Error logs
Safe integration pattern
Keep system instructions, policy, and must-cite facts outside the part you reduce. Use Rose 1 on the bulky trace context that would otherwise go straight into the next model request.
- Keep protected instructions and safety policy intact.
- Compress only context that would otherwise go to the next model call.
- Log the compression receipt beside the agent step.
- Compare final answer quality against the full-trace baseline.
const compressedTrace = await adola.compress({
query: nextAgentStep,
input: assembledToolTrace,
compression: {
target_ratio: 0.35,
preserve_order: true
}
});
const nextStep = await model.responses.create({
model: "your-agent-model",
input: compressedTrace.output
});