Wenzhao Zheng
- Papers
- 28
Cite
Notes
Only stored in your browser.
Authored papers
28UniGenDet: A Unified Generative-Discriminative Framework for Co-Evolutionary Image Generation and Generated Image Detection
arXiv 2026
D^3QE: Learning Discrete Distribution Discrepancy-aware Quantization Error for Autoregressive-Generated Image Detection
arXiv 2025
GaussianToken: An Effective Image Tokenizer with 2D Gaussian Splatting
arXiv 2025
DVGT: Driving Visual Geometry Transformer
arXiv 2025
Skyra: AI-Generated Video Detection via Grounded Artifact Reasoning
arXiv 2025
SVG-T2I: Scaling Up Text-to-Image Latent Diffusion Model Without Variational Autoencoder
arXiv 2025
SwiftVLA: Unlocking Spatiotemporal Dynamics for Lightweight VLA Models at Minimal Overhead
arXiv 2025
Joint 3D Geometry Reconstruction and Motion Generation for 4D Synthesis from a Single Image
arXiv 2025
Streaming 4D Visual Geometry Transformer
arXiv 2025
Latent Diffusion Model without Variational Autoencoder
arXiv 2025
GeoDrive: 3D Geometry-Informed Driving World Model with Precise Action Control
arXiv 2025
OGGSplat: Open Gaussian Growing for Generalizable Reconstruction with Expanded Field-of-View
arXiv 2025
DiffMoE: Dynamic Token Selection for Scalable Diffusion Transformers
arXiv 2025
SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference
arXiv 2024
Training-free Regional Prompting for Diffusion Transformers
arXiv 2024
GenAD: Generative End-to-End Autonomous Driving
arXiv 2024
Stereo Anything: Unifying Stereo Matching with Large-Scale Mixed Data
arXiv 2024
Preventing Local Pitfalls in Vector Quantization via Optimal Transport
arXiv 2024
Path Choice Matters for Clear Attribution in Path Methods
arXiv 2024
GaussianWorld: Gaussian World Model for Streaming 3D Occupancy Prediction
CVPR 2025 1
Driv3R: Learning Dense 4D Reconstruction for Autonomous Driving
arXiv 2024
EmbodiedOcc: Embodied 3D Occupancy Prediction for Vision-based Online Scene Understanding
ICCV 2025
FactorLLM: Factorizing Knowledge via Mixture of Experts for Large Language Models
arXiv 2024
Tri-Perspective View for Vision-Based 3D Semantic Occupancy Prediction
CVPR 2023 1
SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving
ICCV 2023 1
BEVerse: Unified Perception and Prediction in Birds-Eye-View for Vision-Centric Autonomous Driving
arXiv 2022
OPERA: Omni-Supervised Representation Learning with Hierarchical Supervisions
ICCV 2023 1
Token-Label Alignment for Vision Transformers
ICCV 2023 1
Affiliations
Frequent co-authors
10from 28 papers