Research2026-05-08
Patch-Effect Graph Kernels for LLM Interpretability
Source: Arxiv CS.AI
arXiv:2605.06480v1 Announce Type: new Abstract: Mechanistic interpretability aims to reverse-engineer transformer computations by identifying causal circuits through activation patching. However, scaling these interventions across diverse prompts and task families produces high-dimensional,...
arxivpapers