Research2026-05-12

Causal Dimensionality of Transformer Representations: Measurement, Scaling, and Layer Structure

arXiv:2605.08740v1 Announce Type: cross Abstract: Sparse autoencoders (SAEs) decompose transformer residual streams into interpretable feature dictionaries, yet the relationship between SAE width and causal influence on model output has not been systematically characterised. We introduce causal...

Read Original Article on Arxiv CS.AI

arxivpapers