Research2026-04-27
Beyond Linearity in Attention Projections: The Case for Nonlinear Queries
Source: Arxiv CS.AI
arXiv:2603.13381v2 Announce Type: replace-cross Abstract: Recent algebraic analysis shows that in decoder-only and encoder-only transformers, the Query projection $W_Q$ may be set to identity without noticeable performance deterioration. This is possible because attention depends on $X$ only...
arxivpapers