Research2026-04-30

Time Blindness: Why Video-Language Models Can't See What Humans Can?

arXiv:2505.24867v2 Announce Type: replace-cross Abstract: Recent advances in vision-language models (VLMs) have made impressive strides in understanding spatio-temporal relationships in videos. However, when spatial information is obscured, these models struggle to capture purely temporal patterns....

Read Original Article on Arxiv CS.AI

arxivpapers