Bo Dai

MLE-Dojo: Interactive Environments for Empowering LLM Agents in Machine Learning Engineering

arXiv 2025

GAS: Generative Avatar Synthesis from a Single Image

ICCV 2025

MeshCoder: LLM-Powered Structured Mesh Code Generation from Point Clouds

arXiv 2025

ARTDECO: Towards Efficient and High-Fidelity On-the-Fly 3D Reconstruction with Structured Scene Representation

arXiv 2025

STream3R: Scalable Sequential 3D Reconstruction with Causal Transformer

arXiv 2025

TokenHSI: Unified Synthesis of Physical Human-Scene Interactions through Task Tokenization

CVPR 2025 1

Infinite Mobility: Scalable High-Fidelity Synthesis of Articulated Objects via Procedural Generation

arXiv 2025

CustomX: Unified Character, Action, and Scene Customization in Video World Models

arXiv 2025

ObjectGS: Object-aware Scene Reconstruction and Scene Understanding via Gaussian Splatting

ICCV 2025

InternScenes: A Large-scale Simulatable Indoor Scene Dataset with Realistic Layouts

arXiv 2025

Octree-GS: Towards Consistent Real-time Rendering with LOD-Structured 3D Gaussians

arXiv 2024

CameraCtrl: Enabling Camera Control for Text-to-Video Generation

arXiv 2024

Director3D: Real-world Camera Trajectory and 3D Scene Generation from Text

arXiv 2024

GenAD: Generalized Predictive Model for Autonomous Driving

CVPR 2024 1

MotionLCM: Real-time Controllable Motion Generation via Latent Consistency Model

arXiv 2024

GaussianAnything: Interactive Point Cloud Flow Matching For 3D Object Generation

arXiv 2024

HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation

arXiv 2024

VideoAgent: Self-Improving Video Generation

arXiv 2024

On Domain-Specific Post-Training for Multimodal Large Language Models

arXiv 2024

Matryoshka: Learning to Drive Black-Box LLMs with LLMs

arXiv 2024

DRiVE: Diffusion-based Rigging Empowers Generation of Versatile and Expressive Characters

CVPR 2025 1

LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation

arXiv 2024

AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning

arXiv 2023

Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering

CVPR 2024 1

EdgeSAM: Prompt-In-the-Loop Distillation for On-Device Deployment of SAM

arXiv 2023

Unified Human-Scene Interaction via Prompted Chain-of-Contacts

arXiv 2023

DNA-Rendering: A Diverse Neural Actor Repository for High-Fidelity Human-centric Rendering

ICCV 2023 1

LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models

arXiv 2023

AdaPlanner: Adaptive Planning from Feedback with Language Models

adaplanner-adaptive-planning-from-feedback

InterControl: Zero-shot Human Interaction Generation by Controlling Every Joint

arXiv 2023

X-VoE: Measuring eXplanatory Violation of Expectation in Physical Events

ICCV 2023 1

EpiDiff: Enhancing Multi-View Synthesis via Localized Epipolar-Constrained Diffusion

CVPR 2024 1

DiffBIR: Towards Blind Image Restoration with Generative Diffusion Prior

arXiv 2023

Prototype-based Embedding Network for Scene Graph Generation

CVPR 2023 1