Research2026-05-07

Gated Subspace Inference for Transformer Acceleration

arXiv:2605.03109v1 Announce Type: cross Abstract: A method is presented for accelerating inference in transformer language models by exploiting the low effective rank of the token activation manifold at each layer. The method decomposes each activation vector into a subspace component and a...

Read Original Article on Arxiv CS.AI

arxivpapers