Research2026-05-07
Reasoning-Guided Grounding: Elevating Video Anomaly Detection through Multimodal Large Language Models
Source: Arxiv CS.AI
arXiv:2605.02912v1 Announce Type: cross Abstract: Video Anomaly Detection (VAD) has traditionally been framed as binary classification or outlier detection, providing neither interpretable reasoning nor precise spatial localization of anomalous events. While Vision-Language Models (VLMs) offer rich...
arxivpapersreasoningmultimodal