Research2026-05-12
HY-Himmel Technical Report: Hierarchical Interleaved Multi-stream Motion Encoding for Long Video Understanding
Source: Arxiv CS.AI
arXiv:2605.08158v1 Announce Type: cross Abstract: Long-video understanding with multimodal language models suffers from three compounding bottlenecks: heavy decode cost to obtain dense RGB frames, quadratic token growth with frame count, and weak motion perception under sparse keyframe sampling. We...
arxivpapers