BeClaude
Research2026-05-12

Causal Dimensionality of Transformer Representations: Measurement, Scaling, and Layer Structure

Source: Arxiv CS.AI

arXiv:2605.08740v1 Announce Type: cross Abstract: Sparse autoencoders (SAEs) decompose transformer residual streams into interpretable feature dictionaries, yet the relationship between SAE width and causal influence on model output has not been systematically characterised. We introduce causal...

arxivpapers