Research2026-04-23
veScale-FSDP: Flexible and High-Performance FSDP at Scale
Source: Arxiv CS.AI
arXiv:2602.22437v3 Announce Type: replace-cross Abstract: Fully Sharded Data Parallel (FSDP), also known as Zero Redundancy Optimizer (ZeRO), is widely used for large-scale model training, because of its memory efficiency and minimal intrusion on model code. However, existing FSDP systems rely on...
arxivpapers