Research2026-04-22
Nexusformer: Nonlinear Attention Expansion for Stable and Inheritable Transformer Scaling
Source: Arxiv CS.AI
arXiv:2604.19147v1 Announce Type: cross Abstract: Scaling Transformers typically necessitates training larger models from scratch, as standard architectures struggle to expand without discarding learned representations. We identify the primary bottleneck in the attention mechanism's linear...
arxivpapers