Papers

Trending research and the full catalog - each paper linked to the benchmarks, methods, and models it introduces.

Filtered by domain: Text-to-speechClear

Junyang Lin, Jingren Zhou, Hangrui Hu et al. · 22 Jan 2026

In this report, we present the Qwen3-TTS series, a family of advanced multilingual, controllable, robust, and streaming text-to-speech models.

Text-to-speech Voice cloning

12k

MOSS-TTS Technical Report

Yitian Gong, Botian Jiang, Yiwei Zhao et al. · 18 Mar 2026

This technical report presents MOSS-TTS, a speech generation foundation model built on a scalable recipe: discrete audio tokens, autoregressive modeling, and large-scale pretraining.

Text-to-speech Voice cloning

3.6k0.4/h

Fish Audio S2 Technical Report

Yuxuan Wang, Shijia Liao, Songting Liu et al. · 9 Mar 2026

We introduce Fish Audio S2, an open-sourced text-to-speech system featuring multi-speaker, multi-turn generation, and, most importantly, instruction-following control via natural-language descriptions.

Text-to-speech

31k

dots.tts Technical Report

5 Jun 2026

We present dots.tts, a 2B-parameter continuous autoregressive text-to-speech (TTS) foundation model that models speech in a continuous latent space. Compared with existing continuous autoregressive models, our key innovations are threefold.

Text-to-speech Voice cloning

7890.2/h