Research2026-04-22

Nexusformer: Nonlinear Attention Expansion for Stable and Inheritable Transformer Scaling

arXiv:2604.19147v1 Announce Type: cross Abstract: Scaling Transformers typically necessitates training larger models from scratch, as standard architectures struggle to expand without discarding learned representations. We identify the primary bottleneck in the attention mechanism's linear...

Read Original Article on Arxiv CS.AI

arxivpapers