Multimodal foundation models, such as Gemini and ChatGPT, have revolutionized human-machine interactions by seamlessly integrating various forms of data. Developing a universal spoken language model that comprehends a wide range of natural language instructions is critical for bridging communication gaps and facilitating more intuitive interactions. However, the absence of a comprehensive evaluation benchmark poses a significant challenge. We present Dynamic-SUPERB Phase-2, an open and evolving benchmark for the comprehensive evaluation of instruction-based universal speech models. Building upon the first generation, this second version incorporates 125 new tasks contributed collaboratively by the global research community, expanding the benchmark to a total of 180 tasks, making it the largest benchmark for speech and audio evaluation. While the first generation of Dynamic-SUPERB was limited to classification tasks, Dynamic-SUPERB Phase-2 broadens its evaluation capabilities by introducing a wide array of novel and diverse tasks, including regression and sequence generation, across speech, music, and environmental audio. Evaluation results show that no model performed well universally. SALMONN-13B excelled in English ASR and Qwen2-Audio-7B-Instruct showed high accuracy in emotion recognition, but current models still require further innovations to handle a broader range of tasks. We open-source all task data and the evaluation pipeline at https://github.com/dynamic-superb/dynamic-superb.
Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks
Dynamic-SUPERB Phase-2 expands the evaluation benchmark for universal speech models with diverse tasks, revealing ongoing challenges in handling a broad range of instructions.
- Year
- 2024
- Venue
- arXiv 2024
- Authors
- 80
- Hosting
- Abstract onlyARXIV-DEFAULT
Cite
Notes
Only stored in your browser.
Attribution
- Abstract & full text
- arxiv.org/abs/2411.05361v2ARXIV-DEFAULT
- TL;DR
- Semantic Scholar
Abstract
Authors
80Vahid NorooziWilliam ChenShinji WatanabePuyuan PengDavid HarwathKai-Wei ChangAndy T. LiuHung-Yi LeeShu-wen YangKaren LivescuWei-Cheng TsengJia Qi YipJiatong ShiHsi-Che LinYi-Jen ShihAnkita PasadKe-Han LuZhehuai ChenChao-Han Huck YangAnuj DiwanChih-Kai YangChee-En YuChun-Wei ChenWei-Chih ChenChien-yu HuangYi-Cheng LinYu-Xiang LinChi-An FuChun-Yi KuanWenze RenXuanjun ChenTzu-Quan LinKuan-Po HuangHaibin WuChen-An LiChi-Yuan HsiaoShih-Heng WangFabian Ritter-GutierrezSiddhant AroraYou-Kuan LinMing To ChuangEunjung YeoKalvin ChangChung-Ming ChienKwanghee ChoiJun-You WangCheng-Hsiu HsiehI-Hsiang ChiuHeitor R. GuimarãesJionghao HanTzu-Yuan LinHomu ChangTing-Wu ChangShou-Jen ChenYu-Hua ChenHsi-Chun ChengKunal DhawanJia-Lin FangShi-Xin FangKuan-Yu Fang ChiangHsien-Fu HsiaoChing Yu HsuShao-Syuan HuangLee Chen WeiHsuan-Hao LinHsuan-Ting LinJian-Ren LinTing-Chun LiuLi-Chun LuTsung-Min PaiShih-Yun Shan KuanSuwon ShonYuxun TangYun-Shao TsaiJui-Chiang WeiTzu-Chieh WeiChengxi WuDien-Ruei WuChieh-Chi YangShao-Xiang Yuan