0

A Data-Centric Framework for Addressing Phonetic and Prosodic Challenges in Russian Speech Generative Models

Balalaika, a new Russian speech dataset, improves speech synthesis and enhancement through comprehensive annotations and large-scale data.

Year
2025
Venue
arXiv 2025
Authors
7
Hosting
Abstract onlyARXIV-DEFAULT

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text
arxiv.org/abs/2507.13563ARXIV-DEFAULT
TL;DR
Semantic Scholar
Attribution policy →

Abstract

Russian speech synthesis presents distinctive challenges, including vowel reduction, consonant devoicing, variable stress patterns, homograph ambiguity, and unnatural intonation. This paper introduces Balalaika, a novel dataset comprising more than 2,000 hours of studio-quality Russian speech with comprehensive textual annotations, including punctuation and stress markings. Experimental results show that models trained on Balalaika significantly outperform those trained on existing datasets in both speech synthesis and enhancement tasks. We detail the dataset construction pipeline, annotation methodology, and results of comparative evaluations.

Authors

7