Research2026-04-28

Auditing Sabotage Bench: A Benchmark for Detecting and Fixing Research Sabotage in ML Codebases

arXiv:2604.16286v2 Announce Type: replace Abstract: As AI systems are increasingly used to conduct research autonomously, misaligned systems could introduce subtle flaws that produce misleading results while evading detection. We introduce Auditing Sabotage Bench, a benchmark for evaluating the...

Read Original Article on Arxiv CS.AI

arxivpapersbenchmark