Research2026-05-12

Guaranteed Jailbreaking Defense via Disrupt-and-Rectify Smoothing

arXiv:2605.10582v1 Announce Type: cross Abstract: This paper proposes a guaranteed defense method for large language models (LLMs) to safeguard against jailbreaking attacks. Drawing inspiration from the denoised-smoothing approach in the adversarial defense domain, we propose a novel...

Read Original Article on Arxiv CS.AI

arxivpapers