Policy2026-05-14
Self-Supervised On-Policy Reinforcement Learning via Contrastive Proximal Policy Optimisation
Source: Arxiv CS.AI
arXiv:2605.13554v1 Announce Type: cross Abstract: Contrastive reinforcement learning (CRL) learns goal-conditioned Q-values through a contrastive objective over state-action and goal representations, removing the need for hand-crafted reward functions. Despite impressive success in achieving viable...
arxivpapersrl