Research2026-04-23

Expert Upcycling: Shifting the Compute-Efficient Frontier of Mixture-of-Experts

arXiv:2604.19835v1 Announce Type: cross Abstract: Mixture-of-Experts (MoE) has become the dominant architecture for scaling large language models: frontier models routinely decouple total parameters from per-token computation through sparse expert routing. Scaling laws show that under fixed active...

Read Original Article on Arxiv CS.AI

arxivpapers