Research2026-05-12

Tracking the Truth: Object-Centric Spatio-Temporal Monitoring for Video Large Language Models

arXiv:2605.08974v1 Announce Type: cross Abstract: While multimodal large language models (MLLMs) have advanced video understanding, they remain highly prone to hallucinations in dynamic scenes. We argue this stems from a failure in spatio-temporal monitoring, the ability to persistently track...

Read Original Article on Arxiv CS.AI

arxivpapers