Junyang Lin, Jingren Zhou, Hangrui Hu et al. · 22 Jan 2026
In this report, we present the Qwen3-TTS series, a family of advanced multilingual, controllable, robust, and streaming text-to-speech models.
Trending research and the full catalog - each paper linked to the benchmarks, methods, and models it introduces.
Junyang Lin, Jingren Zhou, Hangrui Hu et al. · 22 Jan 2026
In this report, we present the Qwen3-TTS series, a family of advanced multilingual, controllable, robust, and streaming text-to-speech models.
Yitian Gong, Botian Jiang, Yiwei Zhao et al. · 18 Mar 2026
This technical report presents MOSS-TTS, a speech generation foundation model built on a scalable recipe: discrete audio tokens, autoregressive modeling, and large-scale pretraining.
Yuxuan Wang, Shijia Liao, Songting Liu et al. · 9 Mar 2026
We introduce Fish Audio S2, an open-sourced text-to-speech system featuring multi-speaker, multi-turn generation, and, most importantly, instruction-following control via natural-language descriptions.
5 Jun 2026
We present dots.tts, a 2B-parameter continuous autoregressive text-to-speech (TTS) foundation model that models speech in a continuous latent space. Compared with existing continuous autoregressive models, our key innovations are threefold.