Wengang Zhou
- Papers
- 23
Cite
Notes
Only stored in your browser.
Authored papers
23VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models
arXiv 2025
Vlaser: Vision-Language-Action Model with Synergistic Embodied Reasoning
arXiv 2025
Uni-Sign: Toward Unified Sign Language Understanding at Scale
arXiv 2025
Make-It-Poseable: Feed-forward Latent Posing Model for 3D Humanoid Character Animation
arXiv 2025
Robust Multimodal Large Language Models Against Modality Conflict
arXiv 2025
ROOT: VLM based System for Indoor Scene Understanding and Beyond
arXiv 2024
BoolQuestions: Does Dense Retrieval Understand Boolean Logic in Language?
arXiv 2024
Sinkhorn Distance Minimization for Knowledge Distillation
arXiv 2024
AdaptVision: Dynamic Input Scaling in MLLMs for Versatile Scene Understanding
arXiv 2024
DeepEraser: Deep Iterative Context Mining for Generic Text Eraser
arXiv 2024
TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy
arXiv 2024
TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding
arXiv 2024
EG4D: Explicit Generation of 4D Object without Score Distillation
arXiv 2024
Trustworthy Alignment of Retrieval-Augmented Large Language Models via Reinforcement Learning
arXiv 2024
DIRE for Diffusion-Generated Image Detection
ICCV 2023 1
Hybrid and Collaborative Passage Reranking
arXiv 2023
Masked Motion Predictors are Strong 3D Action Representation Learners
ICCV 2023 1
Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs
arXiv 2023
Cyclic-Bootstrap Labeling for Weakly Supervised Object Detection
ICCV 2023 1
Semantic Image Synthesis via Diffusion Models
arXiv 2022
Geometric Representation Learning for Document Image Rectification
arXiv 2022
DocScanner: Robust Document Image Rectification with Progressive Learning
arXiv 2021
Uformer: A General U-Shaped Transformer for Image Restoration
CVPR 2022 1
Affiliations
Frequent co-authors
10from 23 papers