Research2026-04-24
Intent Laundering: AI Safety Datasets Are Not What They Seem
Source: Arxiv CS.AI
arXiv:2602.16729v3 Announce Type: replace-cross Abstract: We systematically evaluate the quality of widely used adversarial safety datasets from two perspectives: in isolation and in practice. In isolation, we examine how well these datasets reflect real-world adversarial attacks based on three...
arxivpaperssafety