Jiwen Lu
- Papers
- 55
Cite
Notes
Only stored in your browser.
Authored papers
55Segment Anything with Motion, Geometry, and Semantic Adaptation for Complex Nonlinear Visual Object Tracking
arXiv 2026
UniGenDet: A Unified Generative-Discriminative Framework for Co-Evolutionary Image Generation and Generated Image Detection
arXiv 2026
D^3QE: Learning Discrete Distribution Discrepancy-aware Quantization Error for Autoregressive-Generated Image Detection
arXiv 2025
Text-guided Sparse Voxel Pruning for Efficient 3D Visual Grounding
CVPR 2025 1
GaussianToken: An Effective Image Tokenizer with 2D Gaussian Splatting
arXiv 2025
DVGT: Driving Visual Geometry Transformer
arXiv 2025
Skyra: AI-Generated Video Detection via Grounded Artifact Reasoning
arXiv 2025
SVG-T2I: Scaling Up Text-to-Image Latent Diffusion Model Without Variational Autoencoder
arXiv 2025
Joint 3D Geometry Reconstruction and Motion Generation for 4D Synthesis from a Single Image
arXiv 2025
Streaming 4D Visual Geometry Transformer
arXiv 2025
Latent Diffusion Model without Variational Autoencoder
arXiv 2025
UniPre3D: Unified Pre-training of 3D Point Cloud Models with Cross-Modal Gaussian Splatting
unipre3d-unified-pre-training-of-3d-point
SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs
ICCV 2025
OGGSplat: Open Gaussian Growing for Generalizable Reconstruction with Expanded Field-of-View
arXiv 2025
Ola: Pushing the Frontiers of Omni-Modal Language Model
arXiv 2025
DiffMoE: Dynamic Token Selection for Scalable Diffusion Transformers
arXiv 2025
Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond
arXiv 2024
Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution
arXiv 2024
Q-VLM: Post-training Quantization for Large Vision-Language Models
arXiv 2024
ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation
arXiv 2024
FlowTurbo: Towards Real-time Flow-Based Image Generation with Velocity Refiner
arXiv 2024
EmbodiedOcc: Embodied 3D Occupancy Prediction for Vision-based Online Scene Understanding
ICCV 2025
Bridging the Divide: Reconsidering Softmax and Linear Attention
arXiv 2024
DC-Solver: Improving Predictor-Corrector Diffusion Sampler via Dynamic Compensation
arXiv 2024
DPMesh: Exploiting Diffusion Prior for Occluded Human Mesh Recovery
CVPR 2024 1
Efficient Inference of Vision Instruction-Following Models with Elastic Cache
arXiv 2024
XMask3D: Cross-modal Mask Reasoning for Open Vocabulary 3D Semantic Segmentation
arXiv 2024
Embodied Instruction Following in Unknown Environments
arXiv 2024
Once for Both: Single Stage of Importance and Sparsity Search for Vision Transformer Compression
CVPR 2024 1
EmbodiedSAM: Online Segment Any 3D Thing in Real Time
arXiv 2024
Preventing Local Pitfalls in Vector Quantization via Optimal Transport
arXiv 2024
GaussianWorld: Gaussian World Model for Streaming 3D Occupancy Prediction
CVPR 2025 1
Driv3R: Learning Dense 4D Reconstruction for Autonomous Driving
arXiv 2024
Path Choice Matters for Clear Attribution in Path Methods
arXiv 2024
OpenOccupancy: A Large Scale Benchmark for Surrounding Semantic Occupancy Perception
ICCV 2023 1
Unleashing Text-to-Image Diffusion Models for Visual Perception
unleashing-text-to-image-diffusion-models-for
Tri-Perspective View for Vision-Based 3D Semantic Occupancy Prediction
CVPR 2023 1
MCUFormer: Deploying Vision Transformers on Microcontrollers with Limited Memory
mcuformer-deploying-vision-tranformers-on
Take-A-Photo: 3D-to-2D Generative Pre-training of Point Cloud Models
ICCV 2023 1
Efficient Meshy Neural Fields for Animatable Human Avatars
arXiv 2023
SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving
ICCV 2023 1
UniPC: A Unified Predictor-Corrector Framework for Fast Sampling of Diffusion Models
unipc-a-unified-predictor-corrector-framework
Segment and Caption Anything
CVPR 2024 1
Embodied Task Planning with Large Language Models
arXiv 2023
TCOVIS: Temporally Consistent Online Video Instance Segmentation
ICCV 2023 1
HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions
arXiv 2022
SemAffiNet: Semantic-Affine Transformation for Point Cloud Segmentation
CVPR 2022 1
OPERA: Omni-Supervised Representation Learning with Hierarchical Supervisions
ICCV 2023 1
Token-Label Alignment for Vision Transformers
ICCV 2023 1
BEVerse: Unified Perception and Prediction in Birds-Eye-View for Vision-Centric Autonomous Driving
arXiv 2022
Diffusion-SDF: Text-to-Shape via Voxelized Diffusion
CVPR 2023 1
P2P: Tuning Pre-trained Image Models for Point Cloud Analysis with Point-to-Pixel Prompting
arXiv 2022
Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling
CVPR 2022 1
DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting
CVPR 2022 1
An Improved Evaluation Framework for Generative Adversarial Networks
arXiv 2018
Affiliations
Frequent co-authors
10from 55 papers