Research2026-04-28
UpstreamQA: A Modular Framework for Explicit Reasoning on Video Question Answering Tasks
Source: Arxiv CS.AI
arXiv:2604.23145v1 Announce Type: cross Abstract: Video Question Answering (VideoQA) demands models that jointly reason over spatial, temporal, and linguistic cues. However, the task's inherent complexity often requires multi-step reasoning that current large multimodal models (LMMs) perform...
arxivpapersreasoning