Research2026-05-05

Jailbroken Frontier Models Retain Their Capabilities

arXiv:2605.00267v1 Announce Type: cross Abstract: As language model safeguards become more robust, attackers are pushed toward developing increasingly complex jailbreaks. Prior work has found that this complexity imposes a "jailbreak tax" that degrades the target model's task performance. We show...

Read Original Article on Arxiv CS.AI

arxivpapers