Wei-Ning Hsu
- Papers
- 12
Cite
Notes
Only stored in your browser.
Authored papers
12SAM Audio: Segment Anything in Audio
arXiv 2025
Meta Audiobox Aesthetics: Unified Automatic Quality Assessment for Speech, Music, and Sound
arXiv 2025
FlowDec: A flow-based full-band general audio codec with high perceptual quality
arXiv 2025
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization
arXiv 2024
Movie Gen: A Cast of Media Foundation Models
arXiv 2024
MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation
arXiv 2023
DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning
dinosr-self-distillation-and-online
Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language
arXiv 2022
data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language
Preprint 2022 1
Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction
learning-audio-visual-speech-representation
Generative Spoken Language Modeling from Raw Audio
arXiv 2021
Speech Resynthesis from Discrete Disentangled Self-Supervised Representations
arXiv 2021
Affiliations
Frequent co-authors
10from 12 papers