Usage-limit complaints usually show up at the worst time: the agent is deep in a task, the trace is huge, and every next turn has to drag more history through the model. The answer is not to throw away the trace. Keep it for audit and recovery, then compress the bulky context before the next model call.
Adola sits between context assembly and the downstream model. Your product keeps the complete run record. The model gets a smaller working context for the next step.
What burns tokens in long Codex-style runs
The biggest waste is usually not source code. It is repeated evidence from previous steps that was useful once, then becomes background noise.
- Repeated terminal output
- Browser snapshots and screenshots
- Old build logs
- Search results copied across turns
- Prior plans after the task has moved on
- Large file excerpts that only mattered once
A safer pattern
Treat compression as a pre-model hop. Keep the authority in your source files, logs, and trace store. Compress the payload that would otherwise be copied into the next request.
- Keep the raw run history in your own trace.
- Send the model only the context needed for the next action.
- Preserve current user intent, policies, commands, and active errors.
- Record a receipt so saved tokens and risk flags are visible.
const compressed = await adola.compress({
query: "What should the agent do next?",
input: priorToolOutputAndNotes,
compression: { target_ratio: 0.35 }
});
await model.responses.create({
model: "your-agent-model",
input: compressed.output
});