Research2026-05-12
Data-driven Circuit Discovery for Interpretability of Language Models
Source: Arxiv CS.AI
arXiv:2605.09129v1 Announce Type: new Abstract: Circuit discovery aims to explain how language models (LMs) implement a specific task by localizing and interpreting a circuit, a computational subgraph responsible for the LM's behavior. Existing circuit discovery methods are hypothesis-driven; they...
arxivpapers