Wenhao Chai
- Papers
- 28
Cite
Notes
Only stored in your browser.
Authored papers
28Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces
arXiv 2026
MLS-Bench: A Holistic and Rigorous Assessment of AI Systems on Building Better AI
arXiv 2026
Agent Banana: High-Fidelity Image Editing with Agentic Thinking and Tooling
arXiv 2026
BabyVision: Visual Reasoning Beyond Language
arXiv 2026
FrontierSmith: Synthesizing Open-Ended Coding Problems at Scale
arXiv 2026
Muddit: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model
arXiv 2025
LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming?
arXiv 2025
Video-MMLU: A Massive Multi-Discipline Lecture Understanding Benchmark
arXiv 2025
Science-T2I: Addressing Scientific Illusions in Image Synthesis
CVPR 2025 1
TEMPURA: Temporal Event Masked Prediction and Understanding for Reasoning in Action
arXiv 2025
FrontierCS: Evolving Challenges for Evolving Intelligence
arXiv 2025
MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization
arXiv 2025
VideoNSA: Native Sparse Attention Scales Video Understanding
arXiv 2025
Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think
arXiv 2025
An Empirical Study of GPT-4o Image Generation Capabilities
arXiv 2025
EMMOE: A Comprehensive Benchmark for Embodied Mobile Manipulation in Open Environments
arXiv 2025
DiffPO: Diffusion-styled Preference Optimization for Efficient Inference-Time Alignment of Large Language Models
arXiv 2025
Next-Embedding Prediction Makes Strong Vision Learners
arXiv 2025
SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory
samurai-adapting-segment-anything-model-for
MonoTAKD: Teaching Assistant Knowledge Distillation for Monocular 3D Object Detection
CVPR 2025 1
MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
CVPR 2024 1
Five A$^{+}$ Network: You Only Need 9K Parameters for Underwater Image Enhancement
arXiv 2023
DiffFashion: Reference-based Fashion Design with Structure-aware Transfer by Diffusion Models
arXiv 2023
Back to Optimization: Diffusion-based Zero-Shot 3D Human Pose Estimation
arXiv 2023
Global Adaptation meets Local Generalization: Unsupervised Domain Adaptation for 3D Human Pose Estimation
ICCV 2023 1
StableVideo: Text-driven Consistency-aware Diffusion Video Editing
ICCV 2023 1
PoSynDA: Multi-Hypothesis Pose Synthesis Domain Adaptation for Robust 3D Human Pose Estimation
arXiv 2023
Weakly Supervised Two-Stage Training Scheme for Deep Video Fight Detection Model
arXiv 2022
Affiliations
Frequent co-authors
10from 28 papers