Research2026-04-24
Escaping the Agreement Trap: Defensibility Signals for Evaluating Rule-Governed AI
Source: Arxiv CS.AI
arXiv:2604.20972v1 Announce Type: new Abstract: Content moderation systems are typically evaluated by measuring agreement with human labels. In rule-governed environments this assumption fails: multiple decisions may be logically consistent with the governing policy, and agreement metrics penalize...
arxivpapers