Research2026-05-12
Mid-Training with Self-Generated Data Improves Reinforcement Learning in Language Models
Source: Arxiv CS.AI
arXiv:2605.08472v1 Announce Type: new Abstract: The effectiveness of Reinforcement Learning (RL) in Large Language Models (LLMs) depends on the nature and diversity of the data used before and during RL. In particular, reasoning problems can often be approached in multiple ways that rely on...
arxivpapersrl