Chen Wei

MEMO: Memory-Augmented Model Context Optimization for Robust Multi-Turn Multi-Agent LLM Games

arXiv 2026

Chain of World: World Model Thinking in Latent Motion

arXiv 2026

PyVision-RL: Forging Open Agentic Vision Models via RL

arXiv 2026

Perception Encoder: The best visual embeddings are not at the output of the network

arXiv 2025

Play to Generalize: Learning to Reason Through Game Play

arXiv 2025

Scaling Spatial Intelligence with Multimodal Foundation Models

arXiv 2025

PyVision: Agentic Vision with Dynamic Tooling

arXiv 2025

The Quest for Generalizable Motion Generation: Data, Model, and Evaluation

arXiv 2025

Vision-Language-Vision Auto-Encoder: Scalable Knowledge Distillation from Diffusion Models

arXiv 2025

FreeGraftor: Training-Free Cross-Image Feature Grafting for Subject-Driven Text-to-Image Generation

arXiv 2025

SMPLest-X: Ultimate Scaling for Expressive Human Pose and Shape Estimation

arXiv 2025

Models Are Codes: Towards Measuring Malicious Code Poisoning Attacks on Pre-trained Model Hubs

arXiv 2024

AiOS: All-in-One-Stage Expressive Human Pose and Shape Estimation

CVPR 2024 1

ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery

arXiv 2024

WHAC: World-grounded Humans and Cameras

arXiv 2024