Research2026-04-24
Structured Visual Narratives Undermine Safety Alignment in Multimodal Large Language Models
Source: Arxiv CS.AI
arXiv:2603.21697v2 Announce Type: replace-cross Abstract: Multimodal Large Language Models (MLLMs) extend text-only LLMs with visual reasoning, but also introduce new safety failure modes under visually grounded instructions. We study comic-template jailbreaks that embed harmful goals inside simple...
arxivpaperssafetymultimodal