Research2026-05-12
MonitoringBench: Semi-Automated Red-Teaming for Agent Monitoring
Source: Arxiv CS.AI
arXiv:2605.09684v1 Announce Type: cross Abstract: We introduce a red-teaming methodology that exposes harder-to-catch attacks for coding-agent monitors, suggesting that current practices may under-elicit attacks and overstate monitor performance. We identify three challenges with current...
arxivpapersagents