Research2026-05-08
Feature Starvation as Geometric Instability in Sparse Autoencoders
Source: Arxiv CS.AI
arXiv:2605.05341v1 Announce Type: cross Abstract: Sparse autoencoders (SAEs) are used to disentangle the dense, polysemantic internal representations of large language models (LLMs) into interpretable, monosemantic concepts. However, standard $\ell_1$-regularized SAEs suffer from feature starvation...
arxivpapersstability-ai