Research2026-05-06

Bucketing the Good Apples: A Method for Diagnosing and Improving Causal Abstraction

arXiv:2605.02234v1 Announce Type: new Abstract: We present a method for diagnosing interpretation in neural networks by identifying an input subspace where a proposed interpretation is highly faithful. Our method is particularly useful for causal-abstraction-style interpretability, where a...

Read Original Article on Arxiv CS.AI

arxivpapers