0

Xinchao Wang

Papers
68

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
68papers

Authored papers

68

Q-ARVD: Quantizing Autoregressive Video Diffusion Models

arXiv 2026

2026

Vision-Language-Action Safety: Threats, Challenges, Evaluations, and Mechanisms

arXiv 2026

2026

DMax: Aggressive Parallel Decoding for dLLMs

arXiv 2026

2026

ViMU: Benchmarking Video Metaphorical Understanding

arXiv 2026

2026

ViFeEdit: A Video-Free Tuner of Your Video Diffusion Transformer

arXiv 2026

2026

On-Policy Self-Evolution via Failure Trajectories for Agentic Safety Alignment

arXiv 2026

2026

dVoting: Fast Voting for dLLMs

arXiv 2026

2026

Make Geometry Matter for Spatial Reasoning

arXiv 2026

2026

Mix-Quant: Quantized Prefilling, Precise Decoding for Agentic LLMs

arXiv 2026

2026

Anatomy of a Lie: A Multi-Stage Diagnostic Framework for Tracing Hallucinations in Vision-Language Models

arXiv 2026

2026

Gated Condition Injection without Multimodal Attention: Towards Controllable Linear-Attention Transformers

arXiv 2026

2026

AutoMIA: Improved Baselines for Membership Inference Attack via Agentic Self-Exploration

arXiv 2026

2026

NoLan: Mitigating Object Hallucinations in Large Vision-Language Models via Dynamic Suppression of Language Priors

arXiv 2026

2026

OminiControl2: Efficient Conditioning for Diffusion Transformers

arXiv 2025

2025

Discrete Diffusion in Large Language and Multimodal Models: A Survey

arXiv 2025

2025

Efficient Reasoning Models: A Survey

arXiv 2025

2025

dKV-Cache: The Cache for Diffusion Language Models

arXiv 2025

2025

Test3R: Learning to Reconstruct 3D at Test Time

arXiv 2025

2025

Minute-Long Videos with Dual Parallelisms

arXiv 2025

2025

PE3R: Perception-Efficient 3D Reconstruction

arXiv 2025

2025

Dimple: Discrete Diffusion Multimodal Large Language Model with Parallel Decoding

arXiv 2025

2025

Thinkless: LLM Learns When to Think

arXiv 2025

2025

VeriThinker: Learning to Verify Makes Reasoning Model Efficient

arXiv 2025

2025

Image Editing As Programs with Diffusion Models

arXiv 2025

2025

Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual Reasoning from Transit Maps

arXiv 2025

2025

SpotEdit: Selective Region Editing in Diffusion Transformers

arXiv 2025

2025

Vision Bridge Transformer at Scale

arXiv 2025

2025

WorldWarp: Propagating 3D Geometry with Asynchronous Video Diffusion

arXiv 2025

2025

dParallel: Learnable Parallel Decoding for dLLMs

arXiv 2025

2025

SparseD: Sparse Attention for Diffusion Language Models

arXiv 2025

2025

Introducing Visual Perception Token into Multimodal Large Language Model

arXiv 2025

2025

Ultra-Resolution Adaptation with Ease

arXiv 2025

2025

CoT-Valve: Length-Compressible Chain-of-Thought Tuning

arXiv 2025

2025

In-Video Instructions: Visual Signals as Generative Control

arXiv 2025

2025

PixelThink: Towards Efficient Chain-of-Pixel Reasoning

arXiv 2025

2025

MambaOut: Do We Really Need Mamba for Vision?

CVPR 2025 1

2024

Kolmogorov-Arnold Transformer

arXiv 2024

2024

AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising

arXiv 2024

2024

MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models

arXiv 2024

2024

LinFusion: 1 GPU, 1 Minute, 16K Image

arXiv 2024

2024

Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient

CVPR 2025 1

2024

OminiControl: Minimal and Universal Control for Diffusion Transformer

ICCV 2025

2024

MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities

arXiv 2024

2024

FlashSplat: 2D to 3D Gaussian Splatting Segmentation Solved Optimally

arXiv 2024

2024

Hash3D: Training-free Acceleration for 3D Generation

CVPR 2025 1

2024

KAN or MLP: A Fairer Comparison

arXiv 2024

2024

TinyFusion: Diffusion Transformers Learned Shallow

CVPR 2025 1

2024

Gamba: Marry Gaussian Splatting with Mamba for single view 3D reconstruction

arXiv 2024

2024

MindBridge: A Cross-Subject Brain Decoding Framework

CVPR 2024 1

2024

Attention Prompting on Image for Large Vision-Language Models

arXiv 2024

2024

Poison-splat: Computation Cost Attack on 3D Gaussian Splatting

arXiv 2024

2024

MVGamba: Unify 3D Content Generation as State Space Sequence Modeling

arXiv 2024

2024

Compositional Video Generation as Flow Equalization

arXiv 2024

2024

Vista3D: Unravel the 3D Darkside of a Single Image

arXiv 2024

2024

Unsegment Anything by Simulating Deformation

CVPR 2024 1

2024

CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers Up

arXiv 2024

2024

LLM-Pruner: On the Structural Pruning of Large Language Models

llm-pruner-on-the-structural-pruning-of-large

2023

SlimSAM: 0.1% Data Makes Segment Anything Slim

arXiv 2023

2023

TM2D: Bimodality Driven 3D Dance Generation via Music-Text Integration

ICCV 2023 1

2023

DepGraph: Towards Any Structural Pruning

CVPR 2023 1

2023

DeepCache: Accelerating Diffusion Models for Free

CVPR 2024 1

2023

SG-Former: Self-guided Transformer with Evolving Token Reallocation

ICCV 2023 1

2023

Diffusion Model as Representation Learner

ICCV 2023 1

2023

Auxiliary Tasks Benefit 3D Skeleton-based Human Motion Prediction

ICCV 2023 1

2023

PARTNER: Level up the Polar Representation for LiDAR 3D Object Detection

ICCV 2023 1

2023

Inception Transformer

arXiv 2022

2022

Scaling & Shifting Your Features: A New Baseline for Efficient Model Tuning

arXiv 2022

2022

SnapMix: Semantically Proportional Mixing for Augmenting Fine-grained Data

arXiv 2020

2020

Affiliations

No known affiliations.

Frequent co-authors

10

from 68 papers