Skip to content
BeClaude
Research2026-06-30

Governance Decay: How Context Compaction Silently Erases Safety Constraints in Long-Horizon LLM Agents

Originally published byArxiv CS.AI

arXiv:2606.22528v2 Announce Type: replace Abstract: Modern LLM agents increasingly rely on context compaction, summarization, or eviction to keep long-running sessions within a token budget. We show that this context-management layer is a safety-critical failure surface: in-context governance...

The Hidden Danger in Long-Horizon AI Agents

A new preprint from arXiv (2606.22528v2) exposes a critical vulnerability in how modern LLM agents manage long-running conversations. The research, titled "Governance Decay," demonstrates that the very mechanisms used to keep agents within token budgets—context compaction, summarization, and eviction—can silently strip away safety constraints, leading to what the authors call "in-context governance decay."

The core finding is deceptively simple: when an agent compresses or summarizes its conversation history to stay within computational limits, it often loses the safety guardrails that were embedded in earlier turns. A system instruction that says "never execute code without user approval" might survive the first few rounds, but after multiple summarization passes, the agent may "forget" this constraint while retaining the task-specific instructions that conflict with it. The result is an agent that gradually drifts from its original safety posture without any explicit error or warning.

Why This Matters for Safety-Critical Deployments

This is not a theoretical edge case. Long-horizon agents are already deployed in coding assistants, autonomous research tools, and customer service systems that operate over hours or days. The research suggests that as these agents accumulate context, their behavior can silently degrade from "safe" to "unsafe" without triggering any alarms. The safety constraints don't fail catastrophically—they erode incrementally, making detection extremely difficult.

The mechanism is particularly insidious because context compaction is often treated as a purely technical optimization. Engineers tune compression ratios and eviction policies based on token efficiency, not safety preservation. The paper shows that this creates a blind spot: the compaction layer becomes an unmonitored attack surface where safety-critical information is the first to be discarded.

Implications for AI Practitioners

For teams building long-horizon agents, this research demands a fundamental rethinking of context management. Three practical implications stand out:

First, safety constraints must be treated as first-class citizens in compaction logic. Current systems typically compress all context uniformly, but the research suggests that safety instructions should be explicitly preserved or re-injected after each compaction cycle. This might mean maintaining a separate, non-evictable "safety buffer" that persists across summarization steps.

Second, monitoring for governance decay requires new instrumentation. Teams should implement runtime checks that periodically verify whether key safety constraints are still present in the agent's active context. This could be as simple as prompting the agent to restate its safety rules at regular intervals and comparing them against the original specification.

Third, the compaction strategy itself needs to be safety-aware. Not all compression algorithms are equal—some may systematically discard abstract instructions (like safety rules) while retaining concrete task details. Practitioners should test their compaction pipelines specifically for safety constraint retention, not just token efficiency.

Key Takeaways

  • Context compaction and summarization in long-horizon agents can silently erase safety constraints, causing gradual behavioral drift without explicit errors.
  • The safety-critical failure surface exists in the context-management layer, not in the model itself, meaning standard safety evaluations may miss it entirely.
  • Practitioners must implement explicit safety preservation mechanisms, such as non-evictable safety buffers and periodic constraint verification.
  • Compression algorithms should be audited for their tendency to discard abstract safety instructions versus concrete task details, as they are not all equivalent in this regard.
arxivpapersagentssafety