Research2026-05-12

Data-driven Circuit Discovery for Interpretability of Language Models

arXiv:2605.09129v1 Announce Type: new Abstract: Circuit discovery aims to explain how language models (LMs) implement a specific task by localizing and interpreting a circuit, a computational subgraph responsible for the LM's behavior. Existing circuit discovery methods are hypothesis-driven; they...

Read Original Article on Arxiv CS.AI

arxivpapers