Research2026-05-11
Sparse Autoencoders as Plug-and-Play Firewalls for Adversarial Attack Detection in VLMs
Source: Arxiv CS.AI
arXiv:2605.07447v1 Announce Type: cross Abstract: Vision-language models (VLMs) have advanced rapidly and are increasingly deployed in real-world applications, especially with the rise of agent-based systems. However, their safety has received relatively limited attention. Even the latest...
arxivpapers