Research2026-06-19

How Linear Is a Transformer Feed-Forward Block? Per-Block Linear Recoverability Is Learned, Not Architectural

arXiv:2606.19379v1 Announce Type: cross Abstract: Transformer feed-forward networks (FFNs) are often treated as nonlinear stores of computation, yet how nonlinear a trained FFN block actually is has rarely been measured. We treat each FFN as a position-wise input-to-output map and split it into the...

What Happened

A new preprint from arXiv challenges a fundamental assumption about Transformer architectures: that feed-forward networks (FFNs) are inherently nonlinear computational blocks. The researchers systematically measured how "recoverable" each FFN block is using a linear approximation, testing whether the observed linearity stems from architectural design or emerges during training. Their key finding is that per-block linear recoverability is learned, not baked into the architecture itself. In other words, trained FFNs can often be approximated by linear functions with surprising fidelity, especially in later layers, contradicting the common narrative that FFNs serve as irreducibly nonlinear "key-value memories" or "neural Turing machines."

Why It Matters

This result has several profound implications. First, it reframes our understanding of what FFNs actually do. The prevailing view—popularized by the "memory" interpretation where each neuron stores a specific pattern—implies strong nonlinearity. If FFNs are largely linear in practice, that metaphor may be misleading. Second, it suggests that much of the "computation" in a Transformer might be simpler than assumed. If a linear map can recover an FFN's output per position, then the model may be relying on linear transformations for most of its reasoning, with nonlinearity playing a more subtle role—perhaps only in early layers or for specific tokens. Third, it opens the door to more efficient inference: if FFN blocks can be replaced or compressed with linear approximations without significant performance loss, we could see faster, cheaper models.

Implications for AI Practitioners

For engineers and researchers working with Transformers, this finding offers practical guidance. When fine-tuning or pruning models, one might prioritize preserving nonlinearity in early layers while aggressively simplifying later FFN blocks. The paper also hints that current interpretability methods—which often treat FFN neurons as discrete concept detectors—may be overcomplicating reality. A more productive approach could involve analyzing the learned linear subspaces of each block. Additionally, architects designing new models might reconsider whether heavy nonlinear activation functions (like GELU or SwiGLU) are necessary in all layers, or whether a hybrid architecture with linear FFNs in deeper layers could reduce compute without sacrificing quality.

However, caution is warranted. The study measures "recoverability" via linear probes, which does not guarantee that the FFN's function is linear in all contexts—only that a linear approximation works well for the tested inputs. Practitioners should validate on their own tasks before assuming linearity holds universally.

Key Takeaways

Transformer FFN blocks can often be approximated by linear functions, especially in deeper layers, challenging the view that they are inherently nonlinear "memory" stores.
Linear recoverability is a learned property, not a given of the architecture, meaning training dynamics shape how nonlinear each block becomes.
Practitioners may achieve efficiency gains by compressing or replacing later FFN layers with linear approximations, but should verify on specific tasks.
Interpretability efforts should consider that FFN behavior may be simpler than previously thought, potentially shifting focus from individual neurons to linear subspaces.

Read Original Article on Arxiv CS.AI

arxivpapers