Research2026-05-14
REAP the Experts: Why Pruning Prevails for One-Shot MoE compression
Source: Arxiv CS.AI
arXiv:2510.13999v3 Announce Type: replace-cross Abstract: Sparsely-activated Mixture-of-Experts (SMoE) models offer efficient pre-training and low latency but their large parameter counts create significant memory overhead, motivating research into expert compression. Contrary to recent findings...
arxivpapers