BeClaude
Research2026-05-07

Gated Subspace Inference for Transformer Acceleration

Source: Arxiv CS.AI

arXiv:2605.03109v1 Announce Type: cross Abstract: A method is presented for accelerating inference in transformer language models by exploiting the low effective rank of the token activation manifold at each layer. The method decomposes each activation vector into a subspace component and a...

arxivpapers