0

Jiwen Lu

Papers
55

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
55papers

Authored papers

55

Segment Anything with Motion, Geometry, and Semantic Adaptation for Complex Nonlinear Visual Object Tracking

arXiv 2026

2026

UniGenDet: A Unified Generative-Discriminative Framework for Co-Evolutionary Image Generation and Generated Image Detection

arXiv 2026

2026

D^3QE: Learning Discrete Distribution Discrepancy-aware Quantization Error for Autoregressive-Generated Image Detection

arXiv 2025

2025

Text-guided Sparse Voxel Pruning for Efficient 3D Visual Grounding

CVPR 2025 1

2025

GaussianToken: An Effective Image Tokenizer with 2D Gaussian Splatting

arXiv 2025

2025

DVGT: Driving Visual Geometry Transformer

arXiv 2025

2025

Skyra: AI-Generated Video Detection via Grounded Artifact Reasoning

arXiv 2025

2025

SVG-T2I: Scaling Up Text-to-Image Latent Diffusion Model Without Variational Autoencoder

arXiv 2025

2025

Joint 3D Geometry Reconstruction and Motion Generation for 4D Synthesis from a Single Image

arXiv 2025

2025

Streaming 4D Visual Geometry Transformer

arXiv 2025

2025

Latent Diffusion Model without Variational Autoencoder

arXiv 2025

2025

UniPre3D: Unified Pre-training of 3D Point Cloud Models with Cross-Modal Gaussian Splatting

unipre3d-unified-pre-training-of-3d-point

2025

SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs

ICCV 2025

2025

OGGSplat: Open Gaussian Growing for Generalizable Reconstruction with Expanded Field-of-View

arXiv 2025

2025

Ola: Pushing the Frontiers of Omni-Modal Language Model

arXiv 2025

2025

DiffMoE: Dynamic Token Selection for Scalable Diffusion Transformers

arXiv 2025

2025

Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond

arXiv 2024

2024

Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution

arXiv 2024

2024

Q-VLM: Post-training Quantization for Large Vision-Language Models

arXiv 2024

2024

ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation

arXiv 2024

2024

FlowTurbo: Towards Real-time Flow-Based Image Generation with Velocity Refiner

arXiv 2024

2024

EmbodiedOcc: Embodied 3D Occupancy Prediction for Vision-based Online Scene Understanding

ICCV 2025

2024

Bridging the Divide: Reconsidering Softmax and Linear Attention

arXiv 2024

2024

DC-Solver: Improving Predictor-Corrector Diffusion Sampler via Dynamic Compensation

arXiv 2024

2024

DPMesh: Exploiting Diffusion Prior for Occluded Human Mesh Recovery

CVPR 2024 1

2024

Efficient Inference of Vision Instruction-Following Models with Elastic Cache

arXiv 2024

2024

XMask3D: Cross-modal Mask Reasoning for Open Vocabulary 3D Semantic Segmentation

arXiv 2024

2024

Embodied Instruction Following in Unknown Environments

arXiv 2024

2024

Once for Both: Single Stage of Importance and Sparsity Search for Vision Transformer Compression

CVPR 2024 1

2024

EmbodiedSAM: Online Segment Any 3D Thing in Real Time

arXiv 2024

2024

Preventing Local Pitfalls in Vector Quantization via Optimal Transport

arXiv 2024

2024

GaussianWorld: Gaussian World Model for Streaming 3D Occupancy Prediction

CVPR 2025 1

2024

Driv3R: Learning Dense 4D Reconstruction for Autonomous Driving

arXiv 2024

2024

Path Choice Matters for Clear Attribution in Path Methods

arXiv 2024

2024

OpenOccupancy: A Large Scale Benchmark for Surrounding Semantic Occupancy Perception

ICCV 2023 1

2023

Unleashing Text-to-Image Diffusion Models for Visual Perception

unleashing-text-to-image-diffusion-models-for

2023

Tri-Perspective View for Vision-Based 3D Semantic Occupancy Prediction

CVPR 2023 1

2023

MCUFormer: Deploying Vision Transformers on Microcontrollers with Limited Memory

mcuformer-deploying-vision-tranformers-on

2023

Take-A-Photo: 3D-to-2D Generative Pre-training of Point Cloud Models

ICCV 2023 1

2023

Efficient Meshy Neural Fields for Animatable Human Avatars

arXiv 2023

2023

SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving

ICCV 2023 1

2023

UniPC: A Unified Predictor-Corrector Framework for Fast Sampling of Diffusion Models

unipc-a-unified-predictor-corrector-framework

2023

Segment and Caption Anything

CVPR 2024 1

2023

Embodied Task Planning with Large Language Models

arXiv 2023

2023

TCOVIS: Temporally Consistent Online Video Instance Segmentation

ICCV 2023 1

2023

HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions

arXiv 2022

2022

SemAffiNet: Semantic-Affine Transformation for Point Cloud Segmentation

CVPR 2022 1

2022

OPERA: Omni-Supervised Representation Learning with Hierarchical Supervisions

ICCV 2023 1

2022

Token-Label Alignment for Vision Transformers

ICCV 2023 1

2022

BEVerse: Unified Perception and Prediction in Birds-Eye-View for Vision-Centric Autonomous Driving

arXiv 2022

2022

Diffusion-SDF: Text-to-Shape via Voxelized Diffusion

CVPR 2023 1

2022

P2P: Tuning Pre-trained Image Models for Point Cloud Analysis with Point-to-Pixel Prompting

arXiv 2022

2022

Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling

CVPR 2022 1

2021

DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting

CVPR 2022 1

2021

An Improved Evaluation Framework for Generative Adversarial Networks

arXiv 2018

2018

Affiliations

No known affiliations.

Frequent co-authors

10

from 55 papers