Shuai Wang
- Papers
- 39
Cite
Notes
Only stored in your browser.
Authored papers
39SAM3-LiteText: An Anatomical Study of the SAM3 Text Encoder for Efficient Vision-Language Segmentation
arXiv 2026
Cubic Discrete Diffusion: Discrete Visual Generation on High-Dimensional Representation Tokens
arXiv 2026
Baichuan-M3: Modeling Clinical Inquiry for Reliable Medical Decision-Making
arXiv 2026
DiffRetriever: Parallel Representative Tokens for Retrieval with Diffusion Language Models
arXiv 2026
NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation
arXiv 2026
DiffRhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion
arXiv 2025
LeVo: High-Quality Song Generation with Multi-Preference Alignment
arXiv 2025
SongBloom: Coherent Song Generation via Interleaved Autoregressive Sketching and Diffusion Refinement
arXiv 2025
DDT: Decoupled Diffusion Transformer
ddt-decoupled-diffusion-transformer
OmniSQL: Synthesizing High-quality Text-to-SQL Data at Scale
arXiv 2025
SoK: Evaluating Jailbreak Guardrails for Large Language Models
arXiv 2025
ACEBench: Who Wins the Match Point in Tool Learning?
arXiv 2025
SocialEval: Evaluating Social Intelligence of Large Language Models
arXiv 2025
DeCo: Frequency-Decoupled Pixel Diffusion for End-to-End Image Generation
arXiv 2025
Flowing Backwards: Improving Normalizing Flows via Reverse Representation Alignment
arXiv 2025
ReSum: Unlocking Long-Horizon Search Intelligence via Context Summarization
arXiv 2025
PixNerd: Pixel Neural Field Diffusion
arXiv 2025
LaRA: Benchmarking Retrieval-Augmented Generation and Long-Context LLMs - No Silver Bullet for LC or RAG Routing
arXiv 2025
Differentiable Solver Search for Fast Diffusion Sampling
arXiv 2025
HiconAgent: History Context-aware Policy Optimization for GUI Agents
arXiv 2025
Advances in Speech Separation: Techniques, Challenges, and Future Trends
arXiv 2025
MotionRAG: Motion Retrieval-Augmented Image-to-Video Generation
arXiv 2025
GUI-G1: Understanding R1-Zero-Like Training for Visual Grounding in GUI Agents
arXiv 2025
DMM: Building a Versatile Image Generation Model via Distillation-Based Model Merging
arXiv 2025
CHOP: Mobile Operating Assistant with Constrained High-frequency Optimized Subtask Planning
arXiv 2025
NAMET: Robust Massive Model Editing via Noise-Aware Memory Optimization
arXiv 2025
DetailFlow: 1D Coarse-to-Fine Autoregressive Image Generation via Next-Detail Prediction
arXiv 2025
Symbolic Learning Enables Self-Evolving Agents
arXiv 2024
SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation
arXiv 2024
AI PERSONA: Towards Life-long Personalization of LLMs
arXiv 2024
StepTool: A Step-grained Reinforcement Learning Framework for Tool Learning in LLMs
arXiv 2024
WenetSpeech4TTS: A 12,800-hour Mandarin TTS Corpus for Large Speech Generation Model Benchmark
arXiv 2024
OOP: Object-Oriented Programming Evaluation Benchmark for Large Language Models
arXiv 2024
Starbucks: Improved Training for 2D Matryoshka Embeddings
arXiv 2024
Tackling Data Heterogeneity in Federated Learning via Loss Decomposition
arXiv 2024
Deep Equilibrium Object Detection
ICCV 2023 1
Parsing is All You Need for Accurate Gait Recognition in the Wild
arXiv 2023
Towards Open-Vocabulary Video Instance Segmentation
ICCV 2023 1
MeSH Suggester: A Library and System for MeSH Term Suggestion for Systematic Review Boolean Query Construction
arXiv 2022
Affiliations
Frequent co-authors
10from 39 papers