BeClaude
Research2026-05-12

Capacity-Aware Inference: Mitigating the Straggler Effect in Mixture of Experts

Source: Arxiv CS.AI

arXiv:2503.05066v5 Announce Type: replace-cross Abstract: The Mixture of Experts (MoE) is an effective architecture for scaling large language models by leveraging sparse expert activation to balance performance and efficiency. However, under expert parallelism, MoE suffers from inference...

arxivpapersrag