0

Kai Yu

Papers
21

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
21papers

Authored papers

21

Think-Then-Generate: Reasoning-Aware Text-to-Image Diffusion with LLM Encoders

arXiv 2026

2026

HiDream-I1: A High-Efficient Image Generative Foundation Model with Sparse Diffusion Transformer

arXiv 2025

2025

MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix

arXiv 2025

2025

URO-Bench: A Comprehensive Benchmark for End-to-End Spoken Dialogue Models

arXiv 2025

2025

NeuSym-RAG: Hybrid Neural Symbolic Retrieval with Multiview Structuring for PDF Question Answering

arXiv 2025

2025

F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching

arXiv 2024

2024

AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding

arXiv 2024

2024

SLAM-Omni: Timbre-Controllable Voice Interaction System with Single-Stage Training

arXiv 2024

2024

MobA: Multifaceted Memory-Enhanced Adaptive Planning for Efficient Mobile Task Automation

arXiv 2024

2024

FakeSound: Deepfake General Audio Detection

arXiv 2024

2024

UrFound: Towards Universal Retinal Foundation Models via Knowledge-Guided Masked Modeling

arXiv 2024

2024

Hierarchical Multimodal Pre-training for Visually Rich Webpage Understanding

arXiv 2024

2024

Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?

arXiv 2024

2024

A Detailed Audio-Text Data Simulation Pipeline using Single-Event Sounds

arXiv 2024

2024

Large Language Models Are Semi-Parametric Reinforcement Learning Agents

large-language-models-are-semi-parametric

2023

SciEval: A Multi-Level Large Language Model Evaluation Benchmark for Scientific Research

arXiv 2023

2023

CSS: A Large-scale Cross-schema Chinese Text-to-SQL Medical Dataset

arXiv 2023

2023

DiffDub: Person-generic Visual Dubbing Using Inpainting Renderer with Diffusion Auto-encoder

arXiv 2023

2023

Diverse Data Augmentation with Diffusions for Effective Test-time Prompt Tuning

ICCV 2023 1

2023

Mobile-Env: Building Qualified Evaluation Benchmarks for LLM-GUI Interaction

arXiv 2023

2023

Towards Instance-adaptive Inference for Federated Learning

ICCV 2023 1

2023

Affiliations

No known affiliations.

Frequent co-authors

10

from 21 papers