Research2026-05-12
Not All Turns Matter: Credit Assignment for Multi-Turn Jailbreaking
Source: Arxiv CS.AI
arXiv:2605.08778v1 Announce Type: new Abstract: Deploying LLMs in multi-turn dialogues facilitates jailbreak attacks that distribute harmful intent across seemingly benign turns. Recent training-based multi-turn jailbreak methods learn long-horizon attack strategies from interaction feedback, but...
arxivpapers