BeClaude
Research2026-05-08

HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding

Source: Arxiv CS.AI

arXiv:2601.14724v4 Announce Type: replace-cross Abstract: Recent advancements in Multimodal Large Language Models (MLLMs) have demonstrated significant improvement in offline video understanding. However, extending these capabilities to streaming video inputs, remains challenging, as existing...

arxivpapers