Research2026-05-12

Elastic MoE: Unlocking the Inference-Time Scalability of Mixture-of-Experts

arXiv:2509.21892v2 Announce Type: replace-cross Abstract: Mixture-of-Experts (MoE) models typically fix the number of activated experts $k$ at both training and inference. However, real-world deployments often face heterogeneous hardware, fluctuating workloads, and diverse quality-latency...

Read Original Article on Arxiv CS.AI

arxivpapers