Haochen Wang
- Papers
- 21
Cite
Notes
Only stored in your browser.
Authored papers
21SAMTok: Representing Any Mask with Two Words
arXiv 2026
VideoZeroBench: Probing the Limits of Video MLLMs with Spatio-Temporal Evidence Verification
arXiv 2026
The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer
ICCV 2025
Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology
arXiv 2025
MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs
arXiv 2025
Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs
arXiv 2025
Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence
arXiv 2025
PairUni: Pairwise Training for Unified Multimodal Language Models
arXiv 2025
Hita: Holistic Tokenizer for Autoregressive Image Generation
ICCV 2025
MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation
arXiv 2025
Reconstructive Visual Instruction Tuning
arXiv 2024
OpenSatMap: A Fine-grained High-resolution Satellite Dataset for Large-scale Map Construction
arXiv 2024
VISA: Reasoning Video Object Segmentation via Large Language Models
arXiv 2024
Balancing Logit Variation for Long-tailed Semantic Segmentation
balancing-logit-variation-for-long-tailed
Towards Open-Vocabulary Video Instance Segmentation
ICCV 2023 1
Bootstrap Masked Visual Modeling via Hard Patches Mining
arXiv 2023
DropPos: Pre-Training Vision Transformers by Reconstructing Dropped Positions
droppos-pre-training-vision-transformers-by
Pulling Target to Source: A New Perspective on Domain Adaptive Semantic Segmentation
arXiv 2023
Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels
CVPR 2022 1
Score Jacobian Chaining: Lifting Pretrained 2D Diffusion Models for 3D Generation
CVPR 2023 1
Learning from Future: A Novel Self-Training Framework for Semantic Segmentation
arXiv 2022
Affiliations
Frequent co-authors
10from 21 papers