Research2026-04-27
How Learning Rate Decay Wastes Your Best Data in Curriculum-Based LLM Pretraining
Source: Arxiv CS.AI
arXiv:2511.18903v2 Announce Type: replace-cross Abstract: Due to the scarcity of high-quality data, large language models (LLMs) are often trained on mixtures of data with varying quality levels, even after sophisticated data curation. A natural approach to better leverage high-quality data is...
arxivpapers