Research2026-04-23
Expert Upcycling: Shifting the Compute-Efficient Frontier of Mixture-of-Experts
Source: Arxiv CS.AI
arXiv:2604.19835v1 Announce Type: cross Abstract: Mixture-of-Experts (MoE) has become the dominant architecture for scaling large language models: frontier models routinely decouple total parameters from per-token computation through sparse expert routing. Scaling laws show that under fixed active...
arxivpapers