BeClaude
Research2026-04-22

Nexusformer: Nonlinear Attention Expansion for Stable and Inheritable Transformer Scaling

Source: Arxiv CS.AI

arXiv:2604.19147v1 Announce Type: cross Abstract: Scaling Transformers typically necessitates training larger models from scratch, as standard architectures struggle to expand without discarding learned representations. We identify the primary bottleneck in the attention mechanism's linear...

arxivpapers