BeClaude
Research2026-05-12

Guaranteed Jailbreaking Defense via Disrupt-and-Rectify Smoothing

Source: Arxiv CS.AI

arXiv:2605.10582v1 Announce Type: cross Abstract: This paper proposes a guaranteed defense method for large language models (LLMs) to safeguard against jailbreaking attacks. Drawing inspiration from the denoised-smoothing approach in the adversarial defense domain, we propose a novel...

arxivpapers