Lei Yang

Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters

arXiv 2026

Bridging Semantic and Kinematic Conditions with Diffusion-based Discrete Motion Tokenizer

arXiv 2026

A Very Big Video Reasoning Suite

arXiv 2026

Demystifying Video Reasoning

arXiv 2026

EmbodMocap: In-the-Wild 4D Human-Scene Reconstruction for Embodied Agents

arXiv 2026

EgoLife: Towards Egocentric Life Assistant

CVPR 2025 1

ConsistCompose: Unified Multimodal Layout Control for Image Composition

arXiv 2025

Scaling Spatial Intelligence with Multimodal Foundation Models

arXiv 2025

Step-Audio 2 Technical Report

arXiv 2025

DPoser-X: Diffusion Model as Robust 3D Whole-body Human Pose Prior

arXiv 2025

The Quest for Generalizable Motion Generation: Data, Model, and Evaluation

arXiv 2025

TokensGen: Harnessing Condensed Tokens for Long Video Generation

ICCV 2025

SMPLest-X: Ultimate Scaling for Expressive Human Pose and Shape Estimation

arXiv 2025

ProBench: Benchmarking Large Language Models in Competitive Programming

arXiv 2025

MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers

arXiv 2024

V2X-Radar: A Multi-modal Dataset with 4D Radar for Cooperative Perception

arXiv 2024

UniTalker: Scaling up Audio-Driven 3D Facial Animation through A Unified Model

arXiv 2024

Clients Collaborate: Flexible Differentially Private Federated Learning with Guaranteed Improvement of Utility-Privacy Trade-off

arXiv 2024

AiOS: All-in-One-Stage Expressive Human Pose and Shape Estimation

CVPR 2024 1

FuxiTranyu: A Multilingual Large Language Model Trained with Balanced Data

arXiv 2024

WHAC: World-grounded Humans and Cameras

arXiv 2024

GaussianEditor: Swift and Controllable 3D Editing with Gaussian Splatting

CVPR 2024 1

SMPLer-X: Scaling Up Expressive Human Pose and Shape Estimation

smpler-x-scaling-up-expressive-human-pose-and

DNA-Rendering: A Diverse Neural Actor Repository for High-Fidelity Human-centric Rendering

ICCV 2023 1

Contrastive Deep Nonnegative Matrix Factorization for Community Detection

arXiv 2023

ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model

ICCV 2023 1

SHERF: Generalizable Human NeRF from a Single Image

ICCV 2023 1

IT3D: Improved Text-to-3D Generation with Explicit View Synthesis

arXiv 2023

ReliTalk: Relightable Talking Portrait Generation from a Single Video

arXiv 2023

Zolly: Zoom Focal Length Correctly for Perspective-Distorted Human Mesh Reconstruction

ICCV 2023 1