Research2026-05-12
Guaranteed Jailbreaking Defense via Disrupt-and-Rectify Smoothing
Source: Arxiv CS.AI
arXiv:2605.10582v1 Announce Type: cross Abstract: This paper proposes a guaranteed defense method for large language models (LLMs) to safeguard against jailbreaking attacks. Drawing inspiration from the denoised-smoothing approach in the adversarial defense domain, we propose a novel...
arxivpapers