Xihui Liu
- Papers
- 59
Cite
Notes
Only stored in your browser.
Authored papers
59World Guidance: World Modeling in Condition Space for Action Generation
arXiv 2026
EVATok: Adaptive Length Video Tokenization for Efficient Visual Autoregressive Generation
arXiv 2026
Cubic Discrete Diffusion: Discrete Visual Generation on High-Dimensional Representation Tokens
arXiv 2026
PhysForge: Generating Physics-Grounded 3D Assets for Interactive Virtual World
arXiv 2026
MultiWorld: Scalable Multi-Agent Multi-View Video World Models
arXiv 2026
MACRO: Advancing Multi-Reference Image Generation with Structured Long-Context Data
arXiv 2026
EgoSim: Egocentric World Simulator for Embodied Interaction Generation
arXiv 2026
From Pixels to Concepts: Do Segmentation Models Understand What They Segment?
arXiv 2026
Fair splits flip the leaderboard: CHANRG reveals limited generalization in RNA secondary-structure prediction
arXiv 2026
AIBench: Evaluating Visual-Logical Consistency in Academic Illustration Generation
arXiv 2026
GameFactory: Creating New Games with Generative Interactive Videos
ICCV 2025
AnimeShooter: A Multi-Shot Animation Dataset for Reference-Guided Video Generation
arXiv 2025
Ground Slow, Move Fast: A Dual-System Foundation Model for Generalizable Vision-and-Language Navigation
arXiv 2025
DreamCube: 3D Panorama Generation via Multi-plane Synchronization
arXiv 2025
GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning
arXiv 2025
T2ISafety: Benchmark for Assessing Fairness, Toxicity, and Privacy in Image Generation
CVPR 2025 1
GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learning
arXiv 2025
Personalized Text-to-Image Generation with Auto-Regressive Models
arXiv 2025
Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance
arXiv 2025
VL-LN Bench: Towards Long-horizon Goal-oriented Navigation with Active Dialogs
arXiv 2025
StreamVLN: Streaming Vision-and-Language Navigation via SlowFast Context Modeling
arXiv 2025
TTS-VAR: A Test-Time Scaling Framework for Visual Auto-Regressive Generation
arXiv 2025
CodePlot-CoT: Mathematical Visual Reasoning by Thinking with Code-Driven Images
arXiv 2025
FLUX-Reason-6M & PRISM-Bench: A Million-Scale Text-to-Image Reasoning Dataset and Comprehensive Benchmark
arXiv 2025
OmniX: From Unified Panoramic Generation and Perception to Graphics-Ready 3D Scenes
arXiv 2025
Understand Before You Generate: Self-Guided Training for Autoregressive Image Generation
arXiv 2025
OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding
arXiv 2025
SRUM: Fine-Grained Self-Rewarding for Unified Multimodal Models
arXiv 2025
T2I-ReasonBench: Benchmarking Reasoning-Informed Text-to-Image Generation
arXiv 2025
HoloPart: Generative 3D Part Amodal Segmentation
arXiv 2025
GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation
ICCV 2025
Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation
ICCV 2025
Routing Matters in MoE: Scaling Diffusion Transformers with Explicit Routing Guidance
arXiv 2025
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing
arXiv 2025
MIDI: Multi-Instance Diffusion for Single Image to 3D Scene Generation
CVPR 2025 1
PredBench: Benchmarking Spatio-Temporal Prediction across Diverse Disciplines
arXiv 2024
Parallelized Autoregressive Visual Generation
CVPR 2025 1
T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation
CVPR 2025 1
Editing Massive Concepts in Text-to-Image Diffusion Models
arXiv 2024
SAMPart3D: Segment Any Part in 3D Objects
arXiv 2024
DiM: Diffusion Mamba for Efficient High-Resolution Image Synthesis
arXiv 2024
DreamWaltz-G: Expressive 3D Gaussian Avatars from Skeleton-Guided 2D Diffusion
arXiv 2024
Moto: Latent Motion Token as the Bridging Language for Learning Robot Manipulation from Videos
ICCV 2025
PUMA: Empowering Unified MLLM with Multi-granular Visual Generation
ICCV 2025
LVD-2M: A Long-take Video Dataset with Temporally Dense Captions
arXiv 2024
MBQ: Modality-Balanced Quantization for Large Vision-Language Models
CVPR 2025 1
BEACON: Benchmark for Comprehensive RNA Tasks and Language Models
arXiv 2024
Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding
arXiv 2024
V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding
ICCV 2025
SAM3D: Segment Anything in 3D Scenes
arXiv 2023
EgoPlan-Bench: Benchmarking Multimodal Large Language Models for Human-Level Planning
arXiv 2023
A Survey of Reasoning with Foundation Models
arXiv 2023
HyperHuman: Hyper-Realistic Human Generation with Latent Structural Diffusion
arXiv 2023
OV-PARTS: Towards Open-Vocabulary Part Segmentation
NeurIPS 2023 11
DDP: Diffusion Model for Dense Visual Prediction
ICCV 2023 1
DreamComposer: Controllable 3D Object Generation via Multi-View Conditions
CVPR 2024 1
The ArtBench Dataset: Benchmarking Generative Models with Artworks
arXiv 2022
Back to the Source: Diffusion-Driven Test-Time Adaptation
arXiv 2022
Learning Transferable Spatiotemporal Representations from Natural Script Knowledge
CVPR 2023 1
Affiliations
Frequent co-authors
10from 59 papers