Research2026-04-30
Time Blindness: Why Video-Language Models Can't See What Humans Can?
Source: Arxiv CS.AI
arXiv:2505.24867v2 Announce Type: replace-cross Abstract: Recent advances in vision-language models (VLMs) have made impressive strides in understanding spatio-temporal relationships in videos. However, when spatial information is obscured, these models struggle to capture purely temporal patterns....
arxivpapers