Yike Guo

Entropy-Preserving Supervised Fine-Tuning via Adaptive Self-Distillation for Large Reasoning Models

arXiv 2026

UniSH: Unifying Scene and Human Reconstruction in a Feed-Forward Pass

arXiv 2026

Audio-Omni: Extending Multi-modal Understanding to Versatile Audio Generation and Editing

arXiv 2026

YuE: Scaling Open Foundation Models for Long-Form Music Generation

arXiv 2025

FinMME: Benchmark Dataset for Financial Multi-Modal Reasoning Evaluation

arXiv 2025

Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens

arXiv 2025

Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis

arXiv 2025

Audio-FLAN: An Instruction-Following Dataset for Unified Audio Understanding and Generation of Speech, Music, and Sound

arXiv 2025

Sub-MoE: Efficient Mixture-of-Expert LLMs Compression via Subspace Expert Merging

arXiv 2025

ReViSE: Towards Reason-Informed Video Editing in Unified Models with Self-Reflective Learning

arXiv 2025

Discovering symbolic expressions with parallelized tree search

arXiv 2024

ChatMusician: Understanding and Generating Music Intrinsically with LLM

arXiv 2024

Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model

arXiv 2024

Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training

arXiv 2024

MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions

arXiv 2024

ComposerX: Multi-Agent Symbolic Music Composition with LLMs

arXiv 2024

VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling

CVPR 2025 1

You Know What I'm Saying: Jailbreak Attack via Implicit Reference

arXiv 2024

Importance Weighting Can Help Large Language Models Self-Improve

arXiv 2024

A Survey of Reasoning with Foundation Models

arXiv 2023

MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training

arXiv 2023

CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model

arXiv 2023

LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by Whispering to ChatGPT

arXiv 2023