Research2026-05-05
Jailbroken Frontier Models Retain Their Capabilities
Source: Arxiv CS.AI
arXiv:2605.00267v1 Announce Type: cross Abstract: As language model safeguards become more robust, attackers are pushed toward developing increasingly complex jailbreaks. Prior work has found that this complexity imposes a "jailbreak tax" that degrades the target model's task performance. We show...
arxivpapers