Shinji Watanabe
- Papers
- 18
Cite
Notes
Only stored in your browser.
Authored papers
18PRiSM: Benchmarking Phone Realization in Speech Models
arXiv 2026
An Empirical Recipe for Universal Phone Recognition
arXiv 2026
OWSM v4: Improving Open Whisper-Style Speech Models via Data Scaling and Cleaning
arXiv 2025
Lessons Learned from the URGENT 2024 Speech Enhancement Challenge
arXiv 2025
SingingSDS: A Singing-Capable Spoken Dialogue System for Conversational Roleplay Applications
arXiv 2025
BLAB: Brutally Long Audio Bench
arXiv 2025
EARS: An Anechoic Fullband Speech Dataset Benchmarked for Speech Enhancement and Dereverberation
arXiv 2024
Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks
arXiv 2024
Less Peaky and More Accurate CTC Forced Alignment by Label Priors
arXiv 2024
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
arXiv 2023
A Vector Quantized Approach for Text to Speech Synthesis on Real-World Spontaneous Speech
arXiv 2023
I3D: Transformer architectures with input-dependent dynamic depth for speech recognition
arXiv 2023
DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech Models
arXiv 2023
Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Supervision, and LLM Mix-up Augmentation
arXiv 2023
HEAR: Holistic Evaluation of Audio Representations
arXiv 2022
S3PRL-VC: Open-source Voice Conversion Framework with Self-supervised Speech Representations
arXiv 2021
GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio
arXiv 2021
End-to-End Speaker Diarization for an Unknown Number of Speakers with Encoder-Decoder Based Attractors
arXiv 2020
Affiliations
Frequent co-authors
10from 18 papers