0

Yuxuan Wang

Papers
35

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
35papers

Authored papers

35

Fish Audio S2 Technical Report

arXiv 2026

2026

Thoth: Mid-Training Bridges LLMs to Time Series Understanding

arXiv 2026

2026

The AI Hippocampus: How Far are We From Human Memory?

arXiv 2026

2026

CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-training

arXiv 2025

2025

Qwen3-Omni Technical Report

arXiv 2025

2025

Qwen3-VL Technical Report

arXiv 2025

2025

MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix

arXiv 2025

2025

MagiCodec: Simple Masked Gaussian-Injected Codec for High-Fidelity Reconstruction and Generation

arXiv 2025

2025

Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space

arXiv 2025

2025

TPLA: Tensor Parallel Latent Attention for Efficient Disaggregated Prefill and Decode Inference

arXiv 2025

2025

OmniVideoBench: Towards Audio-Visual Understanding Evaluation for Omni MLLMs

arXiv 2025

2025

Hierarchical Frequency Tagging Probe (HFTP): A Unified Approach to Investigate Syntactic Structure Representations in Large Language Models and the Human Brain

arXiv 2025

2025

Solla: Towards a Speech-Oriented LLM That Hears Acoustic Context

arXiv 2025

2025

From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation up to 100K Tokens

arXiv 2025

2025

Discrete Markov Bridge

arXiv 2025

2025

Seed-TTS: A Family of High-Quality Versatile Speech Generation Models

arXiv 2024

2024

Friends-MMC: A Dataset for Multi-modal Multi-party Conversation Understanding

arXiv 2024

2024

Understanding Multimodal Hallucination with Parameter-Free Representation Alignment

arXiv 2024

2024

STAIR: Spatial-Temporal Reasoning with Auditable Intermediate Results for Video Question Answering

arXiv 2024

2024

Progressive Confident Masking Attention Network for Audio-Visual Segmentation

arXiv 2024

2024

TimeXer: Empowering Transformers for Time Series Forecasting with Exogenous Variables

arXiv 2024

2024

SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words

arXiv 2024

2024

HawkEye: Training Video-Text LLMs for Grounding Text in Videos

arXiv 2024

2024

VideoLLM Knows When to Speak: Enhancing Time-Sensitive Video Comprehension with Video-Text Duet Interaction Format

arXiv 2024

2024

Efficient Temporal Extrapolation of Multimodal Large Language Models with Temporal Grounding Bridge

arXiv 2024

2024

AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining

arXiv 2023

2023

Separate Anything You Describe

arXiv 2023

2023

Halo: Estimation and Reduction of Hallucinations in Open-Source Weak Large Language Models

arXiv 2023

2023

VSTAR: A Video-grounded Dialogue Dataset for Situated Semantic Understanding with Scene and Topic Transitions

arXiv 2023

2023

Shuo Wen Jie Zi: Rethinking Dictionaries and Glyphs for Chinese Language Pre-training

arXiv 2023

2023

VoiceFixer: A Unified Framework for High-Fidelity Speech Restoration

arXiv 2022

2022

NeuFA: Neural Network Based End-to-End Forced Alignment with Bidirectional Attention Mechanism

arXiv 2022

2022

Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task

arXiv 2022

2022

Decoupling Magnitude and Phase Estimation with Deep ResUNet for Music Source Separation

arXiv 2021

2021

VoiceFixer: Toward General Speech Restoration with Neural Vocoder

arXiv 2021

2021

Affiliations

No known affiliations.

Frequent co-authors

10

from 35 papers