Back to News
Research2026-04-17
Visual Self-Fulfilling Alignment: Shaping Safety-Oriented Personas via Threat-Related Images
Source: Arxiv CS.AI
arXiv:2603.08486v2 Announce Type: replace-cross Abstract: Multimodal large language models (MLLMs) face safety misalignment, where visual inputs enable harmful outputs. To address this, existing methods require explicit safety labels or contrastive data; yet, threat-related concepts are concrete...
arxivpaperssafety