Back to News
Research2026-04-17
Echoes Over Time: Unlocking Length Generalization in Video-to-Audio Generation Models
Source: Arxiv CS.AI
arXiv:2602.20981v3 Announce Type: replace-cross Abstract: Scaling multimodal alignment between video and audio is challenging, particularly due to limited data and the mismatch between text descriptions and frame-level video information. In this work, we tackle the scaling challenge in...
arxivpapers