BeClaude
Research2026-05-12

Detecting Multi-Agent Collusion Through Multi-Agent Interpretability

Source: Arxiv CS.AI

arXiv:2604.01151v2 Announce Type: replace Abstract: As LLM agents are increasingly deployed in multi-agent systems, they introduce risks of covert coordination that may evade standard forms of human oversight. While linear probes on model activations have shown promise for detecting deception in...

arxivpapersagents