Research2026-04-28
Polychromic Objectives for Reinforcement Learning
Source: Arxiv CS.AI
arXiv:2509.25424v5 Announce Type: replace-cross Abstract: Reinforcement learning fine-tuning (RLFT) is a dominant paradigm for improving pretrained policies for downstream tasks. These pretrained policies, trained on large datasets, produce generations with a broad range of promising but unrefined...
arxivpapersrl