SwanVoice: Expressive Long-Form Zero-Shot Speech Synthesis for Both Monologue and Dialogue Paper • 2605.30993 • Published May 29 • 62
Towards Streaming Synchronized Spatial Audio Generation via Autoregressive Diffusion Transformer Paper • 2605.30940 • Published May 29 • 38
Comprehensive Benchmarking of Long-Form Speech Generation in Diverse Scenarios Paper • 2605.28618 • Published May 27 • 32