Thanks to improvements in machine learning techniques including deep learning, a free large-scale speech corpus that can be shared between academic institutions and commercial companies has an important role. However, such a corpus for Japanese speech synthesis does not exist. In this paper, we designed a novel Japanese speech corpus, named the "JSUT corpus," that is aimed at achieving end-to-end speech synthesis. The corpus consists of 10 hours of reading-style speech data and its transcription and covers all of the main pronunciations of daily-use Japanese characters. In this paper, we describe how we designed and analyzed the corpus. The corpus is freely available online.
JSUT corpus: free large-scale Japanese speech corpus for end-to-end speech synthesis
A new Japanese speech corpus, JSUT, is designed for end-to-end speech synthesis and is freely available online.
- Year
- 2017
- Venue
- arXiv 2017
- Authors
- 3
- Hosting
- Abstract onlyARXIV-DEFAULT
Cite
Notes
Only stored in your browser.
Attribution
- Abstract & full text
- arxiv.org/abs/1711.00354ARXIV-DEFAULT
- TL;DR
- Semantic Scholar