OnDeFog: Online Decision Transformer under Frame Dropping
arXiv:2606.19721v1 Announce Type: cross Abstract: In challenging real-world reinforcement learning applications, communication delays or sensor failures often cause frame dropping, in which the agent cannot receive the dropped states and associated rewards. To address the performance degradation...
What Happened
Researchers have introduced OnDeFog (Online Decision Transformer under Frame Dropping), a novel framework designed to address a critical but understudied problem in reinforcement learning: performance degradation caused by frame dropping. Unlike standard RL scenarios where agents receive continuous state-reward pairs, real-world deployments often suffer from communication delays or sensor failures that cause intermittent data loss. OnDeFog extends the Decision Transformer architecture—which treats RL as a sequence modeling problem—into an online setting where dropped frames create missing data points in the agent's observation history.
The core innovation lies in how OnDeFog handles these missing observations without requiring explicit imputation or retraining. By leveraging the transformer's inherent ability to attend to variable-length sequences, the model learns to operate effectively even when portions of the trajectory are absent. The framework introduces a masking mechanism that allows the transformer to distinguish between observed and missing frames, preventing the model from making decisions based on incomplete or corrupted context.
Why It Matters
Frame dropping is not a niche edge case—it is pervasive in real-world RL deployments. Autonomous vehicles lose camera feeds momentarily, industrial robots experience sensor blackouts, and cloud-based RL systems face network packet loss. Traditional RL algorithms assume perfect observability, and even robust methods like PPO or SAC degrade sharply when frames are dropped, because they rely on temporal continuity for value estimation and policy updates.
OnDeFog's significance lies in its practical orientation. Rather than proposing a theoretical solution that assumes idealized conditions, it directly confronts the messy reality of hardware limitations and network instability. The approach is particularly timely given the growing deployment of RL in safety-critical domains like healthcare monitoring, drone navigation, and manufacturing automation, where frame dropping is not just inconvenient but potentially dangerous.
From a research perspective, OnDeFog bridges two important trends: the rise of transformer-based RL (Decision Transformer, Trajectory Transformer) and the need for robust online learning under partial observability. It suggests that sequence modeling architectures may be inherently more resilient to data corruption than traditional value-based or policy-gradient methods, because they can learn to ignore or compensate for missing tokens through attention mechanisms.
Implications for AI Practitioners
For engineers deploying RL systems in production, OnDeFog offers a practical path forward. Instead of engineering complex failover systems or redundant sensor arrays to guarantee perfect frame delivery, practitioners can now consider using transformer-based policies that gracefully handle missing data. This could reduce infrastructure costs and simplify system design.
However, adoption requires careful consideration. OnDeFog's performance likely depends on the pattern of frame dropping—random vs. bursty losses may require different masking strategies. Additionally, the computational overhead of transformer inference compared to smaller policy networks may be prohibitive for latency-sensitive applications like real-time control.
The work also highlights an important methodological shift: the RL community should benchmark algorithms not just on clean simulator environments but on corrupted observation streams that mimic real-world failures. OnDeFog provides a template for how to evaluate and design for such conditions.
Key Takeaways
- OnDeFog extends Decision Transformers to handle frame dropping without requiring data imputation or algorithm retraining, using a masking mechanism to distinguish observed from missing frames
- Frame dropping is a critical but often ignored problem in real-world RL, affecting autonomous systems, robotics, and cloud-based deployments where sensor or network failures are common
- Transformer-based RL architectures may offer inherent robustness to missing data compared to traditional methods, suggesting a shift in how practitioners should design for unreliable environments
- Practitioners should evaluate RL systems under realistic frame-dropping scenarios and consider OnDeFog's approach as a template for building production-ready policies that degrade gracefully rather than catastrophically