Research2026-04-27
UniSonate: A Unified Model for Speech, Music, and Sound Effect Generation with Text Instructions
Source: Arxiv CS.AI
arXiv:2604.22209v1 Announce Type: cross Abstract: Generative audio modeling has largely been fragmented into specialized tasks, text-to-speech (TTS), text-to-music (TTM), and text-to-audio (TTA), each operating under heterogeneous control paradigms. Unifying these modalities remains a fundamental...
arxivpapers