Most LLM applications accumulate bulky context: retrieved documents, copied logs, repeated tool output, support history, account notes, and long agent traces. That context often reaches the most expensive model call unchanged, even when much of it is not needed for the next answer.

Rose 1 sits after your retrieval, routing, or context assembly step. Send the context block you were about to pass downstream; receive a smaller prompt plus a receipt that makes savings and tradeoffs visible.

ConnectReduceProtectMeasure

Why this matters

Token savings alone are not enough. Teams also need readable outputs, predictable controls, and a receipt they can log next to the model request. That lets you compare compressed and full-context runs without treating lower token count as the only success metric.

  • Connect: Send the context payload you would normally pass to the downstream model.
  • Reduce: Rose 1 returns a smaller, model-ready version for the next call.
  • Protect: Critical instructions, schemas, citations, and quoted text can stay intact.
  • Measure: Every run includes a receipt so teams can compare cost, latency, and output quality.

What gets measured

Every compression response includes original tokens, output tokens, tokens saved, compression ratio, latency, and risk flags. That receipt is what lets a team compare full-context and compressed runs before rolling the path into production traffic.

{
  "receipt": {
    "original_tokens": 1840,
    "output_tokens": 621,
    "tokens_saved": 1219,
    "compression_ratio": 0.337,
    "latency_ms": 84,
    "risk_level": "low"
  }
}