Research2026-05-12
A Benchmark for Evaluating Outcome-Driven Constraint Violations in Autonomous AI Agents
Source: Arxiv CS.AI
arXiv:2512.20798v5 Announce Type: replace Abstract: As autonomous AI agents are increasingly deployed in high-stakes environments, ensuring their safety and alignment with human values is becoming a practical deployment concern. Current benchmarks for AI agents primarily evaluate refusal of...
arxivpapersagentsbenchmark