Research2026-05-12
RL Fine-Tuning Heals OOD Forgetting in SFT
Source: Arxiv CS.AI
arXiv:2509.12235v3 Announce Type: replace-cross Abstract: Supervised Fine-Tuning (SFT) followed by Reinforcement Learning (RL) is a standard post-training recipe for improving Large Language Models (LLM) reasoning, but why it works remains unclear. We revisit the common claim that ``SFT memorizes,...
arxivpapersfine-tuning