Wenyu Liu
- Papers
- 38
Cite
Notes
Only stored in your browser.
Authored papers
38UniDriveVLA: Unifying Understanding, Perception, and Action Planning for Autonomous Driving
arXiv 2026
RAD-2: Scaling Reinforcement Learning in a Generator-Discriminator Framework
arXiv 2026
4DLangVGGT: 4D Language-Visual Geometry Grounded Transformer
arXiv 2025
MobileI2V: Fast and High-Resolution Image-to-Video on Mobile Devices
arXiv 2025
Visual Generation Tuning
arXiv 2025
Image-Free Timestep Distillation via Continuous-Time Consistency with Trajectory-Sampled Pairs
arXiv 2025
InfiniteVL: Synergizing Linear and Sparse Attention for Highly-Efficient, Unlimited-Input Vision-Language Models
arXiv 2025
DiffusionVL: Translating Any Autoregressive Models into Diffusion Vision Language Models
arXiv 2025
ReCogDrive: A Reinforced Cognitive Framework for End-to-End Autonomous Driving
arXiv 2025
AlphaDrive: Unleashing the Power of VLMs in Autonomous Driving via Reinforcement Learning and Reasoning
arXiv 2025
PixelHacker: Image Inpainting with Structural and Semantic Consistency
pixelhacker-image-inpainting-with-structural
MaTVLM: Hybrid Mamba-Transformer for Efficient Vision-Language Modeling
ICCV 2025
GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding
ICCV 2025
OmniMamba: Efficient and Unified Multimodal Understanding and Generation via State Space Models
arXiv 2025
Multimodal Mamba: Decoder-only Multimodal State Space Model via Quadratic to Linear Distillation
arXiv 2025
ControlAR: Controllable Image Generation with Autoregressive Models
arXiv 2024
EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model
arXiv 2024
GaraMoSt: Parallel Multi-Granularity Motion and Structural Modeling for Efficient Multi-Frame Interpolation in DSA Images
arXiv 2024
MoSt-DSA: Modeling Motion and Structural Interactions for Direct Multi-Frame Interpolation in DSA Images
arXiv 2024
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
arXiv 2024
YOLO-World: Real-Time Open-Vocabulary Object Detection
CVPR 2024 1
VADv2: End-to-End Vectorized Autonomous Driving via Probabilistic Planning
arXiv 2024
Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving
arXiv 2024
GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding
CVPR 2025 1
Mask-Adapter: The Devil is in the Masks for Open-Vocabulary Segmentation
CVPR 2025 1
ViTGaze: Gaze Following with Interaction Features in Vision Transformers
arXiv 2024
TrackSSM: A General Motion Predictor by State-Space Model
arXiv 2024
LKCell: Efficient Cell Nuclei Instance Segmentation with Large Convolution Kernels
arXiv 2024
SparseTrack: Multi-Object Tracking by Performing Scene Decomposition based on Pseudo-Depth
arXiv 2023
GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion Models
CVPR 2024 1
4D Gaussian Splatting for Real-Time Dynamic Scene Rendering
CVPR 2024 1
Matte Anything: Interactive Natural Image Matting with Segment Anything Models
arXiv 2023
When Counting Meets HMER: Counting-Aware Network for Handwritten Mathematical Expression Recognition
arXiv 2022
PD-Quant: Post-Training Quantization based on Prediction Difference Metric
CVPR 2023 1
Knowledge Mining with Scene Text for Fine-Grained Recognition
CVPR 2022 1
You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection
NeurIPS 2021 12
ByteTrack: Multi-Object Tracking by Associating Every Detection Box
bytetrack-multi-object-tracking-by
CCNet: Criss-Cross Attention for Semantic Segmentation
ccnet-criss-cross-attention-for-semantic-1
Affiliations
Frequent co-authors
10from 38 papers