Xiaohan Wang
- Papers
- 24
Cite
Notes
Only stored in your browser.
Authored papers
24V-GRPO: Online Reinforcement Learning for Denoising Generative Models Is Easier than You Think
arXiv 2026
ResRL: Boosting LLM Reasoning via Negative Sample Projection Residual Reinforcement Learning
arXiv 2026
Tool Verification for Test-Time Reinforcement Learning
arXiv 2026
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
preprint
BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature
CVPR 2025 1
Video Action Differencing
arXiv 2025
Temporal Preference Optimization for Long-Form Video Understanding
arXiv 2025
UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist
arXiv 2025
Transductive Visual Programming: Evolving Tool Libraries from Experience for Spatial Reasoning
arXiv 2025
SciVideoBench: Benchmarking Scientific Video Reasoning in Large Multimodal Models
arXiv 2025
Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation
CVPR 2025 1
DeepSeek-V3 Technical Report
arXiv 2024
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
arXiv 2024
Why are Visually-Grounded Language Models Bad at Image Classification?
arXiv 2024
Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision
arXiv 2024
How to Unleash the Power of Large Language Models for Few-shot Relation Extraction?
arXiv 2023
Test-Time Adaptation with CLIP Reward for Zero-Shot Generalization in Vision-Language Models
arXiv 2023
Whitening-based Contrastive Learning of Sentence Embeddings
arXiv 2023
LLMs for Knowledge Graph Construction and Reasoning: Recent Capabilities and Future Opportunities
arXiv 2023
Describing Differences in Image Sets with Natural Language
CVPR 2024 1
Bird's-Eye-View Scene Graph for Vision-Language Navigation
ICCV 2023 1
Clustering based Point Cloud Representation Learning for 3D Analysis
ICCV 2023 1
JOTR: 3D Joint Contrastive Learning with Transformers for Occluded Human Mesh Recovery
ICCV 2023 1
CenterCLIP: Token Clustering for Efficient Text-Video Retrieval
arXiv 2022
Affiliations
Frequent co-authors
10from 24 papers