Hung-Yi Lee
- Papers
- 36
Cite
Notes
Only stored in your browser.
Authored papers
36How Auditory Knowledge in LLM Backbones Shapes Audio Language Models: A Holistic Evaluation
arXiv 2026
MoVE: Translating Laughter and Tears via Mixture of Vocalization Experts in Speech-to-Speech Translation
arXiv 2026
TASTE: Text-Aligned Speech Tokenization and Embedding for Spoken Language Modeling
arXiv 2025
AudioLens: A Closer Look at Auditory Attribute Perception of Large Audio-Language Models
arXiv 2025
SAKURA: On the Multi-hop Reasoning of Large Audio-Language Models Based on Speech and Audio Information
arXiv 2025
ToxicTone: A Mandarin Audio Dataset Annotated for Toxicity and Toxic Utterance Tonality
arXiv 2025
Spectral-Aware Low-Rank Adaptation for Speaker Verification
arXiv 2025
CodeJudgeBench: Benchmarking LLM-as-a-Judge for Coding Tasks
arXiv 2025
DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment
arXiv 2025
SAKE: Towards Editing Auditory Attribute Knowledge of Large Audio-Language Models
arXiv 2025
Towards Holistic Evaluation of Large Audio-Language Models: A Comprehensive Survey
arXiv 2025
Investigating Safety Vulnerabilities of Large Audio-Language Models Under Speaker Emotional Variations
arXiv 2025
REBORN: Reinforcement-Learned Boundary Segmentation with Iterative Training for Unsupervised ASR
arXiv 2024
DeSTA2: Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data
arXiv 2024
Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks
arXiv 2024
Advancing Large Language Models to Capture Varied Speaking Styles and Respond Properly in Spoken Conversations
arXiv 2024
StreamBench: Towards Benchmarking Continuous Improvement of Language Agents
arXiv 2024
DogeRM: Equipping Reward Models with Domain Knowledge through Model Merging
arXiv 2024
Examining Forgetting in Continual Pre-training of Aligned Large Language Models
arXiv 2024
I Need Help! Evaluating LLM's Ability to Ask for Users' Support: A Case Study on Text-to-SQL Generation
arXiv 2024
Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models
arXiv 2024
Codec-SUPERB: An In-Depth Analysis of Sound Codec Models
arXiv 2024
Let Me Speak Freely? A Study on the Impact of Format Restrictions on Performance of Large Language Models
arXiv 2024
LLM Discussion: Enhancing the Creativity of Large Language Models via Discussion Framework and Role-Play
arXiv 2024
A Closer Look into Automatic Evaluation Using Large Language Models
arXiv 2023
Investigating Zero-Shot Generalizability on Mandarin-English Code-Switched ASR and Speech-to-text Translation of Recent Foundation Models with Self-Supervision and Weak Supervision
arXiv 2023
SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model
arXiv 2022
DUAL: Discrete Spoken Unit Adaptive Learning for Textless Spoken Question Answering
arXiv 2022
AdapterBias: Parameter-efficient Token-dependent Representation Shift for Adapters in NLP Tasks
Findings (NAACL) 2022 7
DistilHuBERT: Speech Representation Learning by Layer-wise Distillation of Hidden-unit BERT
arXiv 2021
S3PRL-VC: Open-source Voice Conversion Framework with Self-supervised Speech Representations
arXiv 2021
TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech
arXiv 2020
Pretrained Language Model Embryology: The Birth of ALBERT
EMNLP 2020 11
One-shot Voice Conversion by Separating Speaker and Content Representations with Instance Normalization
arXiv 2019
Spoken SQuAD: A Study of Mitigating the Impact of Speech Recognition Errors on Listening Comprehension
arXiv 2018
Learning Chinese Word Representations From Glyphs Of Characters
learning-chinese-word-representations-from-1
Affiliations
Frequent co-authors
10from 36 papers