Research2026-04-20
Scaling Behaviors of LLM Reinforcement Learning Post-Training: An Empirical Study in Mathematical Reasoning
Source: Arxiv CS.AI
arXiv:2509.25300v4 Announce Type: replace-cross Abstract: While scaling laws for large language models (LLMs) during pre-training have been extensively studied, their behavior under reinforcement learning (RL) post-training remains largely unexplored. This paper presents a systematic empirical...
arxivpapersreasoningrl