Research2026-05-12
The World is Not Mono: Enabling Spatial Understanding in Large Audio-Language Models
Source: Arxiv CS.AI
arXiv:2601.02954v3 Announce Type: replace-cross Abstract: Large audio-language models have made rapid progress in recognizing what is present in an audio clip, but spatial audio-language understanding still lacks a clear task interface. A model must also decide where sound events occur, which...
arxivpapers