Skip to content
BeClaude
Research2026-06-29

A Primer on SO(3) Action Representations in Deep Reinforcement Learning

Originally published byArxiv CS.AI

arXiv:2510.11103v3 Announce Type: replace-cross Abstract: Many robotic control tasks require policies to act on orientations, yet the geometry of SO(3) makes this nontrivial. Because SO(3) admits no global, smooth, minimal parameterization, common representations such as Euler angles, quaternions,...

The latest revision of the preprint “A Primer on SO(3) Action Representations in Deep Reinforcement Learning” (arXiv:2510.11103) addresses a persistent, often overlooked bottleneck in robotic control: how to represent three-dimensional rotations in a way that neural networks can learn efficiently. The core problem is mathematical—the Special Orthogonal Group SO(3) lacks a global, smooth, and minimal parameterization. This forces practitioners to choose between representations (Euler angles, quaternions, rotation matrices, axis-angle) that each introduce distinct pathologies for learning.

What Happened

The paper systematically analyzes how different SO(3) representations affect the performance of deep reinforcement learning (DRL) agents in orientation-dependent tasks. It demonstrates that the choice of representation is not a trivial implementation detail but a structural design decision that can determine whether a policy converges, diverges, or gets stuck in local optima. The authors provide a formal treatment of the discontinuities, singularities, and non-convexities inherent in each representation, and show how these mathematical properties translate into practical training failures.

Why It Matters

This work is significant for three reasons. First, it fills a gap in the DRL literature, which has largely focused on state representation and reward design while treating orientation representation as a solved problem. Second, it provides concrete guidance for practitioners who are currently using Euler angles out of habit, unaware that the gimbal lock singularities they introduce can destabilize training. Third, it highlights a deeper issue: the DRL community often borrows representations from computer graphics or classical robotics without accounting for the unique demands of gradient-based optimization. A representation that works well for rendering or filtering may be catastrophic for policy gradients.

Implications for AI Practitioners

For engineers building robotic manipulation systems, the immediate takeaway is that quaternions, while popular, are not a panacea. The paper shows that the double-cover property of quaternions (where q and -q represent the same rotation) creates a discontinuity in the loss landscape that can confuse value functions and policy networks. Similarly, rotation matrices, though continuous, are high-dimensional and require careful normalization to stay on the manifold.

The most practical insight is that the optimal representation depends on the specific learning algorithm and the nature of the task. For example, algorithms that rely on smooth value functions (like SAC or TD3) may benefit from the local linearity of axis-angle representations, while policy gradient methods might tolerate the redundancy of rotation matrices better. The paper also suggests that learned representations—where the network is allowed to discover its own internal encoding of orientation—may ultimately outperform hand-crafted ones, though this remains an open research direction.

Key Takeaways

  • No universal best representation exists: Euler angles, quaternions, rotation matrices, and axis-angle each introduce distinct failure modes (singularities, discontinuities, or dimensionality issues) that can destabilize DRL training.
  • Representation choice is a hyperparameter: Practitioners should treat orientation encoding as a design decision worthy of ablation studies, not as a fixed preprocessing step.
  • Algorithm-representation alignment matters: The smoothness and continuity requirements of the learning algorithm (e.g., value-based vs. policy-based) should guide the choice of SO(3) representation.
  • Learned representations are a promising frontier: Allowing networks to discover their own orientation encodings may bypass the limitations of hand-crafted parameterizations, though this approach is not yet mature.
arxivpapersrl