Research2026-05-12

A Benchmark for Evaluating Outcome-Driven Constraint Violations in Autonomous AI Agents

arXiv:2512.20798v5 Announce Type: replace Abstract: As autonomous AI agents are increasingly deployed in high-stakes environments, ensuring their safety and alignment with human values is becoming a practical deployment concern. Current benchmarks for AI agents primarily evaluate refusal of...

Read Original Article on Arxiv CS.AI

arxivpapersagentsbenchmark