Zhongang Cai

Bridging Semantic and Kinematic Conditions with Diffusion-based Discrete Motion Tokenizer

arXiv 2026

A Very Big Video Reasoning Suite

arXiv 2026

Demystifying Video Reasoning

arXiv 2026

EgoLife: Towards Egocentric Life Assistant

CVPR 2025 1

SMPLest-X: Ultimate Scaling for Expressive Human Pose and Shape Estimation

arXiv 2025

ConsistCompose: Unified Multimodal Layout Control for Image Composition

arXiv 2025

Scaling Spatial Intelligence with Multimodal Foundation Models

arXiv 2025

DPoser-X: Diffusion Model as Robust 3D Whole-body Human Pose Prior

arXiv 2025

The Quest for Generalizable Motion Generation: Data, Model, and Evaluation

arXiv 2025

MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers

arXiv 2024

2024

AiOS: All-in-One-Stage Expressive Human Pose and Shape Estimation

CVPR 2024 1

2024

WHAC: World-grounded Humans and Cameras

arXiv 2024

2024

GaussianEditor: Swift and Controllable 3D Editing with Gaussian Splatting

CVPR 2024 1

SMPLer-X: Scaling Up Expressive Human Pose and Shape Estimation

smpler-x-scaling-up-expressive-human-pose-and

DNA-Rendering: A Diverse Neural Actor Repository for High-Fidelity Human-centric Rendering

ICCV 2023 1

ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model

ICCV 2023 1

IT3D: Improved Text-to-3D Generation with Explicit View Synthesis

arXiv 2023

Zolly: Zoom Focal Length Correctly for Perspective-Distorted Human Mesh Reconstruction

ICCV 2023 1

BiBench: Benchmarking and Analyzing Network Binarization

arXiv 2023