BeClaude
Research2026-05-11

Is Chain-of-Thought Really Not Explainability? Chain-of-Thought Can Be Faithful without Hint Verbalization

Source: Arxiv CS.AI

arXiv:2512.23032v2 Announce Type: replace-cross Abstract: Recent work, using the Biasing Features metric, labels a CoT as unfaithful if it omits a prompt-injected hint that affected the prediction. We argue this metric adopts a narrow notion of faithfulness and confuses unfaithfulness with...

arxivpapers