Rui Zhao
- Papers
- 34
Cite
Notes
Only stored in your browser.
Authored papers
34MIND: Benchmarking Memory Consistency and Action Control in World Models
arXiv 2026
FlowInOne:Unifying Multimodal Generation as Image-in, Image-out Flow Matching
arXiv 2026
Seed1.5-VL Technical Report
arXiv 2025
Motion Anything: Any to Motion Generation
arXiv 2025
SORCE: Small Object Retrieval in Complex Environments
arXiv 2025
Efficient Multivariate Time Series Forecasting via Calibrated Language Models with Privileged Knowledge Distillation
arXiv 2025
Glance: Accelerating Diffusion Models with 1 Sample
arXiv 2025
Video Reality Test: Can AI-Generated ASMR Videos fool VLMs and Humans?
arXiv 2025
UniLumos: Fast and Unified Image and Video Relighting with Physics-Plausible Feedback
arXiv 2025
PedDet: Adaptive Spectral Optimization for Multimodal Pedestrian Detection
arXiv 2025
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing
arXiv 2025
DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles
CVPR 2025 1
TimeCMA: Towards LLM-Empowered Multivariate Time Series Forecasting via Cross-Modality Alignment
arXiv 2024
DragAnything: Motion Control for Anything using Entity Representation
arXiv 2024
PET-SQL: A Prompt-Enhanced Two-Round Refinement of Text-to-SQL with Cross-consistency
arXiv 2024
Causal Evaluation of Language Models
arXiv 2024
KMM: Key Frame Mask Mamba for Extended Motion Generation
arXiv 2024
PUMA: Empowering Unified MLLM with Multi-granular Visual Generation
ICCV 2025
EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models
arXiv 2024
Eliminating Feature Ambiguity for Few-Shot Segmentation
arXiv 2024
CLEAR: Can Language Models Really Understand Causal Graphs?
arXiv 2024
Balancing Speciality and Versatility: a Coarse to Fine Framework for Supervised Fine-tuning Large Language Model
arXiv 2024
MotionDirector: Motion Customization of Text-to-Video Diffusion Models
arXiv 2023
Shikra: Unleashing Multimodal LLM's Referential Dialogue Magic
arXiv 2023
Human Preference Score: Better Aligning Text-to-Image Models with Human Preference
ICCV 2023 1
Described Object Detection: Liberating Object Detection with Flexible Expressions
described-object-detection-liberating-object
Link-Context Learning for Multimodal LLMs
CVPR 2024 1
Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation
arXiv 2023
Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis
arXiv 2023
UniHCP: A Unified Model for Human-Centric Perceptions
CVPR 2023 1
InstructDET: Diversifying Referring Object Detection with Generalized Instructions
arXiv 2023
Balancing Logit Variation for Long-tailed Semantic Segmentation
balancing-logit-variation-for-long-tailed
Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels
CVPR 2022 1
Learning from Future: A Novel Self-Training Framework for Semantic Segmentation
arXiv 2022
Affiliations
Frequent co-authors
10from 34 papers