Research2026-05-08
Games for AI Control: Models of Safety Evaluations of AI Deployment Protocols
Source: Arxiv CS.AI
arXiv:2409.07985v2 Announce Type: replace Abstract: To evaluate the safety and usefulness of deployment protocols for untrusted AIs, AI Control uses a red-teaming exercise played between a protocol designer and an adversary. This paper introduces AI-Control Games, a formal decision-making model of...
arxivpaperssafety