Research2026-05-12
Uncovering Intra-expert Activation Sparsity for Efficient Mixture-of-Expert Model Execution
Source: Arxiv CS.AI
arXiv:2605.08575v1 Announce Type: cross Abstract: Mixture of Experts (MoE) architecture has become the standard for state-of-the-art large language models, owing to its computational efficiency through sparse expert activation. However, sparsity through finer expert granularity is becoming...
arxivpapers