Research2026-04-17

Weight Patching: Toward Source-Level Mechanistic Localization in LLMs

arXiv:2604.13694v1 Announce Type: new Abstract: Mechanistic interpretability seeks to localize model behavior to the internal components that causally realize it. Prior work has advanced activation-space localization and causal tracing, but modules that appear important in activation space may...

Read Original Article on Arxiv CS.AI

arxivpapers