Research2026-05-12
Detecting Multi-Agent Collusion Through Multi-Agent Interpretability
Source: Arxiv CS.AI
arXiv:2604.01151v2 Announce Type: replace Abstract: As LLM agents are increasingly deployed in multi-agent systems, they introduce risks of covert coordination that may evade standard forms of human oversight. While linear probes on model activations have shown promise for detecting deception in...
arxivpapersagents