Research2026-05-14

VideoSEAL: Mitigating Evidence Misalignment in Agentic Long Video Understanding by Decoupling Answer Authority

arXiv:2605.12571v1 Announce Type: cross Abstract: Long video question answering requires locating sparse, time-scattered visual evidence within highly redundant content. Although current MLLMs perform well on short videos, long videos introduce long-horizon search and verification, which often...

Read Original Article on Arxiv CS.AI

arxivpapersagents