Research2026-04-24
RIFT: Repurposing Negative Samples via Reward-Informed Fine-Tuning
Source: Arxiv CS.AI
arXiv:2601.09253v2 Announce Type: replace-cross Abstract: While Supervised Fine-Tuning (SFT) and Rejection Sampling Fine-Tuning (RFT) are standard for LLM alignment, they either rely on costly expert data or discard valuable negative samples, leading to data inefficiency. To address this, we...
arxivpapersfine-tuning