0

Rui Zhao

Papers
34

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
34papers

Authored papers

34

MIND: Benchmarking Memory Consistency and Action Control in World Models

arXiv 2026

2026

FlowInOne:Unifying Multimodal Generation as Image-in, Image-out Flow Matching

arXiv 2026

2026

Seed1.5-VL Technical Report

arXiv 2025

2025

Motion Anything: Any to Motion Generation

arXiv 2025

2025

SORCE: Small Object Retrieval in Complex Environments

arXiv 2025

2025

Efficient Multivariate Time Series Forecasting via Calibrated Language Models with Privileged Knowledge Distillation

arXiv 2025

2025

Glance: Accelerating Diffusion Models with 1 Sample

arXiv 2025

2025

Video Reality Test: Can AI-Generated ASMR Videos fool VLMs and Humans?

arXiv 2025

2025

UniLumos: Fast and Unified Image and Video Relighting with Physics-Plausible Feedback

arXiv 2025

2025

PedDet: Adaptive Spectral Optimization for Multimodal Pedestrian Detection

arXiv 2025

2025

GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing

arXiv 2025

2025

DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles

CVPR 2025 1

2025

TimeCMA: Towards LLM-Empowered Multivariate Time Series Forecasting via Cross-Modality Alignment

arXiv 2024

2024

DragAnything: Motion Control for Anything using Entity Representation

arXiv 2024

2024

PET-SQL: A Prompt-Enhanced Two-Round Refinement of Text-to-SQL with Cross-consistency

arXiv 2024

2024

Causal Evaluation of Language Models

arXiv 2024

2024

KMM: Key Frame Mask Mamba for Extended Motion Generation

arXiv 2024

2024

PUMA: Empowering Unified MLLM with Multi-granular Visual Generation

ICCV 2025

2024

EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models

arXiv 2024

2024

Eliminating Feature Ambiguity for Few-Shot Segmentation

arXiv 2024

2024

CLEAR: Can Language Models Really Understand Causal Graphs?

arXiv 2024

2024

Balancing Speciality and Versatility: a Coarse to Fine Framework for Supervised Fine-tuning Large Language Model

arXiv 2024

2024

MotionDirector: Motion Customization of Text-to-Video Diffusion Models

arXiv 2023

2023

Shikra: Unleashing Multimodal LLM's Referential Dialogue Magic

arXiv 2023

2023

Human Preference Score: Better Aligning Text-to-Image Models with Human Preference

ICCV 2023 1

2023

Described Object Detection: Liberating Object Detection with Flexible Expressions

described-object-detection-liberating-object

2023

Link-Context Learning for Multimodal LLMs

CVPR 2024 1

2023

Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation

arXiv 2023

2023

Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis

arXiv 2023

2023

UniHCP: A Unified Model for Human-Centric Perceptions

CVPR 2023 1

2023

InstructDET: Diversifying Referring Object Detection with Generalized Instructions

arXiv 2023

2023

Balancing Logit Variation for Long-tailed Semantic Segmentation

balancing-logit-variation-for-long-tailed

2023

Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels

CVPR 2022 1

2022

Learning from Future: A Novel Self-Training Framework for Semantic Segmentation

arXiv 2022

2022

Affiliations

No known affiliations.

Frequent co-authors

10

from 34 papers