Research2026-05-12

Sparsity Moves Computation: How FFN Architecture Reshapes Attention in Small Transformers

arXiv:2605.09403v1 Announce Type: cross Abstract: Architectural choices inside the Transformer feedforward network (FFN) block do not merely affect the block itself; they reshape the computations learned by the rest of the model. We study this effect in one-layer Transformers trained on digit...

Read Original Article on Arxiv CS.AI

arxivpapers