Research2026-05-12

Mixture of Layers with Hybrid Attention

arXiv:2605.09516v1 Announce Type: cross Abstract: Standard Mixture-of-Experts (MoE) transformers route tokens to expert subnetworks within each layer, but the layer structure itself remains monolithic. We introduce Mixture of Layers (MoL), which replaces full-width transformer blocks (d_model) with...

Read Original Article on Arxiv CS.AI

arxivpapers