Research2026-05-12

Detecting Multi-Agent Collusion Through Multi-Agent Interpretability

arXiv:2604.01151v2 Announce Type: replace Abstract: As LLM agents are increasingly deployed in multi-agent systems, they introduce risks of covert coordination that may evade standard forms of human oversight. While linear probes on model activations have shown promise for detecting deception in...

Read Original Article on Arxiv CS.AI

arxivpapersagents