BeClaude
Research2026-05-14

N-vium: Mixture-of-Exits Transformer for Accelerated Exact Generation

Source: Arxiv CS.AI

arXiv:2605.13190v1 Announce Type: cross Abstract: Improving the inference efficiency of autoregressive transformers typically means reducing FLOPs per token, usually through approximations that degrade model quality. We introduce N-vium, a mixture-of-exits transformer that partially parallelizes...

arxivpapers