We introduce Seed-TTS, a family of large-scale autoregressive text-to-speech (TTS) models capable of generating speech that is virtually indistinguishable from human speech. Seed-TTS serves as a foundation model for speech generation and excels in speech in-context learning, achieving performance in speaker similarity and naturalness that matches ground truth human speech in both objective and subjective evaluations. With fine-tuning, we achieve even higher subjective scores across these metrics. Seed-TTS offers superior controllability over various speech attributes such as emotion and is capable of generating highly expressive and diverse speech for speakers in the wild. Furthermore, we propose a self-distillation method for speech factorization, as well as a reinforcement learning approach to enhance model robustness, speaker similarity, and controllability. We additionally present a non-autoregressive (NAR) variant of the Seed-TTS model, named $\text{Seed-TTS}\text{DiT}$, which utilizes a fully diffusion-based architecture. Unlike previous NAR-based TTS systems, $\text{Seed-TTS}\text{DiT}$ does not depend on pre-estimated phoneme durations and performs speech generation through end-to-end processing. We demonstrate that this variant achieves comparable performance to the language model-based variant and showcase its effectiveness in speech editing. We encourage readers to listen to demos at \url{https://bytedancespeech.github.io/seedtts_tech_report}.
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
Seed-TTS is a family of large-scale TTS models that generate high-quality speech with in-context learning, superior controllability, and a non-autoregressive variant using diffusion-based architecture that does not rely on pre-estimated phoneme durations.
- Year
- 2024
- Venue
- arXiv 2024
- Authors
- 46
- Hosting
- Abstract onlyARXIV-DEFAULT
Cite
Notes
Only stored in your browser.
Attribution
- Abstract & full text
- arxiv.org/abs/2406.02430ARXIV-DEFAULT
- TL;DR
- Semantic Scholar
Abstract
Authors
46Xin WangHui LiShuo ZhangYuxuan WangZhuo ChenJiaxin LiYang ZhangXiaoyang LiYuchen LiuJiawei ChenYuping WangPhilip AnastassiouJitong ChenYuanzhe ChenZiyi ChenJian CongLelai DengChuang DingLu GaoMingqing GongPeisong HuangQingqing HuangZhiying HuangYuanYuan HuoDongya JiaChuMin LiFeiya LiXingxing LiLin LiuShouda LiuSichao LiuXudong LiuZhengxi LiuLu LuJunjie PanZhen WeiJian WuChao YaoYifeng YangYuanHao YiJunteng ZhangQidi ZhangWenjie ZhangZilin ZhaoDejian ZhongXiaobin Zhuang