Xinchao Wang
- Papers
- 68
Cite
Notes
Only stored in your browser.
Authored papers
68Q-ARVD: Quantizing Autoregressive Video Diffusion Models
arXiv 2026
Vision-Language-Action Safety: Threats, Challenges, Evaluations, and Mechanisms
arXiv 2026
DMax: Aggressive Parallel Decoding for dLLMs
arXiv 2026
ViMU: Benchmarking Video Metaphorical Understanding
arXiv 2026
ViFeEdit: A Video-Free Tuner of Your Video Diffusion Transformer
arXiv 2026
On-Policy Self-Evolution via Failure Trajectories for Agentic Safety Alignment
arXiv 2026
dVoting: Fast Voting for dLLMs
arXiv 2026
Make Geometry Matter for Spatial Reasoning
arXiv 2026
Mix-Quant: Quantized Prefilling, Precise Decoding for Agentic LLMs
arXiv 2026
Anatomy of a Lie: A Multi-Stage Diagnostic Framework for Tracing Hallucinations in Vision-Language Models
arXiv 2026
Gated Condition Injection without Multimodal Attention: Towards Controllable Linear-Attention Transformers
arXiv 2026
AutoMIA: Improved Baselines for Membership Inference Attack via Agentic Self-Exploration
arXiv 2026
NoLan: Mitigating Object Hallucinations in Large Vision-Language Models via Dynamic Suppression of Language Priors
arXiv 2026
OminiControl2: Efficient Conditioning for Diffusion Transformers
arXiv 2025
Discrete Diffusion in Large Language and Multimodal Models: A Survey
arXiv 2025
Efficient Reasoning Models: A Survey
arXiv 2025
dKV-Cache: The Cache for Diffusion Language Models
arXiv 2025
Test3R: Learning to Reconstruct 3D at Test Time
arXiv 2025
Minute-Long Videos with Dual Parallelisms
arXiv 2025
PE3R: Perception-Efficient 3D Reconstruction
arXiv 2025
Dimple: Discrete Diffusion Multimodal Large Language Model with Parallel Decoding
arXiv 2025
Thinkless: LLM Learns When to Think
arXiv 2025
VeriThinker: Learning to Verify Makes Reasoning Model Efficient
arXiv 2025
Image Editing As Programs with Diffusion Models
arXiv 2025
Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual Reasoning from Transit Maps
arXiv 2025
SpotEdit: Selective Region Editing in Diffusion Transformers
arXiv 2025
Vision Bridge Transformer at Scale
arXiv 2025
WorldWarp: Propagating 3D Geometry with Asynchronous Video Diffusion
arXiv 2025
dParallel: Learnable Parallel Decoding for dLLMs
arXiv 2025
SparseD: Sparse Attention for Diffusion Language Models
arXiv 2025
Introducing Visual Perception Token into Multimodal Large Language Model
arXiv 2025
Ultra-Resolution Adaptation with Ease
arXiv 2025
CoT-Valve: Length-Compressible Chain-of-Thought Tuning
arXiv 2025
In-Video Instructions: Visual Signals as Generative Control
arXiv 2025
PixelThink: Towards Efficient Chain-of-Pixel Reasoning
arXiv 2025
MambaOut: Do We Really Need Mamba for Vision?
CVPR 2025 1
Kolmogorov-Arnold Transformer
arXiv 2024
AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising
arXiv 2024
MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models
arXiv 2024
LinFusion: 1 GPU, 1 Minute, 16K Image
arXiv 2024
Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient
CVPR 2025 1
OminiControl: Minimal and Universal Control for Diffusion Transformer
ICCV 2025
MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities
arXiv 2024
FlashSplat: 2D to 3D Gaussian Splatting Segmentation Solved Optimally
arXiv 2024
Hash3D: Training-free Acceleration for 3D Generation
CVPR 2025 1
KAN or MLP: A Fairer Comparison
arXiv 2024
TinyFusion: Diffusion Transformers Learned Shallow
CVPR 2025 1
Gamba: Marry Gaussian Splatting with Mamba for single view 3D reconstruction
arXiv 2024
MindBridge: A Cross-Subject Brain Decoding Framework
CVPR 2024 1
Attention Prompting on Image for Large Vision-Language Models
arXiv 2024
Poison-splat: Computation Cost Attack on 3D Gaussian Splatting
arXiv 2024
MVGamba: Unify 3D Content Generation as State Space Sequence Modeling
arXiv 2024
Compositional Video Generation as Flow Equalization
arXiv 2024
Vista3D: Unravel the 3D Darkside of a Single Image
arXiv 2024
Unsegment Anything by Simulating Deformation
CVPR 2024 1
CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers Up
arXiv 2024
LLM-Pruner: On the Structural Pruning of Large Language Models
llm-pruner-on-the-structural-pruning-of-large
SlimSAM: 0.1% Data Makes Segment Anything Slim
arXiv 2023
TM2D: Bimodality Driven 3D Dance Generation via Music-Text Integration
ICCV 2023 1
DepGraph: Towards Any Structural Pruning
CVPR 2023 1
DeepCache: Accelerating Diffusion Models for Free
CVPR 2024 1
SG-Former: Self-guided Transformer with Evolving Token Reallocation
ICCV 2023 1
Diffusion Model as Representation Learner
ICCV 2023 1
Auxiliary Tasks Benefit 3D Skeleton-based Human Motion Prediction
ICCV 2023 1
PARTNER: Level up the Polar Representation for LiDAR 3D Object Detection
ICCV 2023 1
Inception Transformer
arXiv 2022
Scaling & Shifting Your Features: A New Baseline for Efficient Model Tuning
arXiv 2022
SnapMix: Semantically Proportional Mixing for Augmenting Fine-grained Data
arXiv 2020
Affiliations
Frequent co-authors
10from 68 papers