BeClaude
Research2026-05-12

Data Mixing Can Induce Phase Transitions in Knowledge Acquisition

Source: Arxiv CS.AI

arXiv:2505.18091v3 Announce Type: replace-cross Abstract: Large Language Models (LLMs) are typically trained on data mixtures: most data come from web scrapes, while a small portion is curated from high-quality sources with dense domain-specific knowledge. In this paper, we show that when training...

arxivpapers