Codex usage limits and token costs in long agent runs

Usage-limit complaints usually show up at the worst time: the agent is deep in a task, the trace is huge, and every next turn has to drag more history through the model. The answer is not to throw away the trace. Keep it for audit and recovery, then compress the bulky context before the next model call.

Adola sits between context assembly and the downstream model. Your product keeps the complete run record. The model gets a smaller working context for the next step.

Run taskCollect traceTrim contextCall modelLog receipt

What burns tokens in long Codex-style runs

The biggest waste is usually not source code. It is repeated evidence from previous steps that was useful once, then becomes background noise.

Repeated terminal output
Browser snapshots and screenshots
Old build logs
Search results copied across turns
Prior plans after the task has moved on
Large file excerpts that only mattered once

A safer pattern

Treat compression as a pre-model hop. Keep the authority in your source files, logs, and trace store. Compress the payload that would otherwise be copied into the next request.

Keep the raw run history in your own trace.
Send the model only the context needed for the next action.
Preserve current user intent, policies, commands, and active errors.
Record a receipt so saved tokens and risk flags are visible.

const compressed = await adola.compress({
  query: "What should the agent do next?",
  input: priorToolOutputAndNotes,
  compression: { target_ratio: 0.35 }
});

await model.responses.create({
  model: "your-agent-model",
  input: compressed.output
});