Research2026-05-08

How Fast Should a Model Commit to Supervision? Training Reasoning Models on the Tsallis Loss Continuum

arXiv:2604.25907v2 Announce Type: replace-cross Abstract: SFT-then-RLVR is widely used for post-training reasoning models, but why this specific ordering, and why RLVR-only stalls at cold start, have lacked a unifying theoretical account. We provide that account under a unified loss family $J_Q$...

Read Original Article on Arxiv CS.AI

arxivpapersreasoningvision