Research2026-05-06
Bucketing the Good Apples: A Method for Diagnosing and Improving Causal Abstraction
Source: Arxiv CS.AI
arXiv:2605.02234v1 Announce Type: new Abstract: We present a method for diagnosing interpretation in neural networks by identifying an input subspace where a proposed interpretation is highly faithful. Our method is particularly useful for causal-abstraction-style interpretability, where a...
arxivpapers