Research2026-05-12
Probing Cross-modal Information Hubs in Audio-Visual LLMs
Source: Arxiv CS.AI
arXiv:2605.10815v1 Announce Type: new Abstract: Audio-visual large language models (AVLLMs) have recently emerged as a powerful architecture capable of jointly reasoning over audio, visual, and textual modalities. In AVLLMs, the bidirectional interaction between audio and video modalities...
arxivpapers