Research2026-04-23
A Unified Theory of Sparse Dictionary Learning in Mechanistic Interpretability: Piecewise Biconvexity and Spurious Minima
Source: Arxiv CS.AI
arXiv:2512.05534v5 Announce Type: replace-cross Abstract: As AI models achieve remarkable capabilities across diverse domains, understanding what representations they learn and how they encode concepts has become increasingly important for both scientific progress and trustworthy deployment. Recent...
arxivpapers