Research2026-05-12
Bridging Modalities, Spanning Time: Structured Memory for Ultra-Long Agentic Video Reasoning
Source: Arxiv CS.AI
arXiv:2605.08271v1 Announce Type: cross Abstract: Understanding ultra-long videos such as egocentric recordings, live streams, or surveillance footage spanning days to weeks, remains a challenge. For current multimodal LLMs: even with million-token context windows, frame budgets cover only tens of...
arxivpapersreasoningagents