Xudong Wang
- Papers
- 20
Cite
Notes
Only stored in your browser.
Authored papers
20MMaDA-VLA: Large Diffusion Vision-Language-Action Model with Unified Multi-Modal Instruction and Generation
arXiv 2026
VisGym: Diverse, Customizable, Scalable Environments for Multimodal Agents
arXiv 2026
UTPTrack: Towards Simple and Unified Token Pruning for Visual Tracking
arXiv 2026
MMHCL: Multi-Modal Hypergraph Contrastive Learning for Recommendation
arXiv 2025
Syzygy of Thoughts: Improving LLM CoT with the Minimal Free Resolution
arXiv 2025
Reconstruction Alignment Improves Unified Multimodal Models
arXiv 2025
Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens
arXiv 2025
UnSAMv2: Self-Supervised Learning Enables Segment Anything at Any Granularity
arXiv 2025
Constantly Improving Image Models Need Constantly Improving Benchmarks
arXiv 2025
Every Activation Boosted: Scaling General Reasoner to 1 Trillion Open Language Foundation
arXiv 2025
TULIP: Towards Unified Language-Image Pretraining
arXiv 2025
Visually Prompted Benchmarks Are Surprisingly Fragile
arXiv 2025
Every Step Evolves: Scaling Reinforcement Learning for Trillion-Scale Thinking Model
arXiv 2025
Segment Anything without Supervision
arXiv 2024
InstanceDiffusion: Instance-level Control for Image Generation
CVPR 2024 1
Rethinking Patch Dependence for Masked Autoencoders
arXiv 2024
VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation
CVPR 2024 1
Hierarchical Open-vocabulary Universal Image Segmentation
hierarchical-open-vocabulary-universal-image
Unsupervised Universal Image Segmentation
CVPR 2024 1
Long-tailed Recognition by Routing Diverse Distribution-Aware Experts
long-tailed-recognition-by-routing-diverse
Affiliations
Frequent co-authors
10from 20 papers