BeClaude
Policy2026-05-14

Self-Supervised On-Policy Reinforcement Learning via Contrastive Proximal Policy Optimisation

Source: Arxiv CS.AI

arXiv:2605.13554v1 Announce Type: cross Abstract: Contrastive reinforcement learning (CRL) learns goal-conditioned Q-values through a contrastive objective over state-action and goal representations, removing the need for hand-crafted reward functions. Despite impressive success in achieving viable...

arxivpapersrl