Research2026-05-11

Sparse Autoencoders as Plug-and-Play Firewalls for Adversarial Attack Detection in VLMs

arXiv:2605.07447v1 Announce Type: cross Abstract: Vision-language models (VLMs) have advanced rapidly and are increasingly deployed in real-world applications, especially with the rise of agent-based systems. However, their safety has received relatively limited attention. Even the latest...

Read Original Article on Arxiv CS.AI

arxivpapers