Research2026-04-28
Auditing Sabotage Bench: A Benchmark for Detecting and Fixing Research Sabotage in ML Codebases
Source: Arxiv CS.AI
arXiv:2604.16286v2 Announce Type: replace Abstract: As AI systems are increasingly used to conduct research autonomously, misaligned systems could introduce subtle flaws that produce misleading results while evading detection. We introduce Auditing Sabotage Bench, a benchmark for evaluating the...
arxivpapersbenchmark