Research2026-04-20
Jailbreak Scaling Laws for Large Language Models: Polynomial-Exponential Crossover
Source: Arxiv CS.AI
arXiv:2603.11331v2 Announce Type: replace-cross Abstract: Adversarial attacks can reliably steer safety-aligned large language models toward unsafe behavior. Empirically, we find that strong adversarial prompt-injection attacks can amplify attack success rate from the slow polynomial growth...
arxivpapers