Research2026-04-27

UniSonate: A Unified Model for Speech, Music, and Sound Effect Generation with Text Instructions

arXiv:2604.22209v1 Announce Type: cross Abstract: Generative audio modeling has largely been fragmented into specialized tasks, text-to-speech (TTS), text-to-music (TTM), and text-to-audio (TTA), each operating under heterogeneous control paradigms. Unifying these modalities remains a fundamental...

Read Original Article on Arxiv CS.AI

arxivpapers