Research2026-05-08

Feature Starvation as Geometric Instability in Sparse Autoencoders

arXiv:2605.05341v1 Announce Type: cross Abstract: Sparse autoencoders (SAEs) are used to disentangle the dense, polysemantic internal representations of large language models (LLMs) into interpretable, monosemantic concepts. However, standard $\ell_1$-regularized SAEs suffer from feature starvation...

Read Original Article on Arxiv CS.AI

arxivpapersstability-ai