Research2026-05-12
Mixture of Layers with Hybrid Attention
Source: Arxiv CS.AI
arXiv:2605.09516v1 Announce Type: cross Abstract: Standard Mixture-of-Experts (MoE) transformers route tokens to expert subnetworks within each layer, but the layer structure itself remains monolithic. We introduce Mixture of Layers (MoL), which replaces full-width transformer blocks (d_model) with...
arxivpapers