Research2026-05-12
Capacity-Aware Inference: Mitigating the Straggler Effect in Mixture of Experts
Source: Arxiv CS.AI
arXiv:2503.05066v5 Announce Type: replace-cross Abstract: The Mixture of Experts (MoE) is an effective architecture for scaling large language models by leveraging sparse expert activation to balance performance and efficiency. However, under expert parallelism, MoE suffers from inference...
arxivpapersrag