Research2026-04-28
MegaScale-Data: Scaling Dataloader for Multisource Large Foundation Model Training
Source: Arxiv CS.AI
arXiv:2504.09844v4 Announce Type: replace-cross Abstract: Modern frameworks for training large foundation models (LFMs) employ dataloaders in a data-parallel manner, with each loader processing a disjoint subset of training data. When preparing data for LFM training that originates from multiple,...
arxivpapers