Research2026-05-12
Sparsity Moves Computation: How FFN Architecture Reshapes Attention in Small Transformers
Source: Arxiv CS.AI
arXiv:2605.09403v1 Announce Type: cross Abstract: Architectural choices inside the Transformer feedforward network (FFN) block do not merely affect the block itself; they reshape the computations learned by the rest of the model. We study this effect in one-layer Transformers trained on digit...
arxivpapers