BeClaude
Research2026-05-01

Latent Adversarial Detection: Adaptive Probing of LLM Activations for Multi-Turn Attack Detection

Source: Arxiv CS.AI

arXiv:2604.28129v1 Announce Type: cross Abstract: Multi-turn prompt injection follows a known attack path -- trust-building, pivoting, escalation but text-level defenses miss covert attacks where individual turns appear benign. We show this attack path leaves an activation-level signature in the...

arxivpapers