Jianke Zhu
- Papers
- 20
Cite
Notes
Only stored in your browser.
Authored papers
20Unlocking Dense Metric Depth Estimation in VLMs
arXiv 2026
VisionTrim: Unified Vision Token Compression for Training-Free MLLM Acceleration
arXiv 2026
Interp3D: Correspondence-aware Interpolation for Generative Textured 3D Morphing
arXiv 2026
WorldStereo: Bridging Camera-Guided Video Generation and Scene Reconstruction via 3D Geometric Memories
arXiv 2026
Inst3D-LMM: Instance-Aware 3D Scene Understanding with Multi-modal Instruction Tuning
CVPR 2025 1
Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual Reasoning from Transit Maps
arXiv 2025
Vision-Language-Action Models for Autonomous Driving: Past, Present, and Future
arXiv 2025
Forging Spatial Intelligence: A Roadmap of Multi-Modal Data Pre-Training for Autonomous Systems
arXiv 2025
3D and 4D World Modeling: A Survey
arXiv 2025
RewardMap: Tackling Sparse Rewards in Fine-grained Visual Reasoning via Multi-Stage Reinforcement Learning
arXiv 2025
SAM4D: Segment Anything in Camera and LiDAR Streams
ICCV 2025
PixelThink: Towards Efficient Chain-of-Pixel Reasoning
arXiv 2025
Uncertainty-Instructed Structure Injection for Generalizable HD Map Construction
CVPR 2025 1
A Coarse-to-Fine Approach to Multi-Modality 3D Occupancy Grounding
arXiv 2025
PianoMotion10M: Dataset and Benchmark for Hand Motion Generation in Piano Performance
arXiv 2024
TokenPacker: Efficient Visual Projector for Multimodal LLM
arXiv 2024
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM
CVPR 2025 1
Layer-wise Importance Matters: Less Memory for Better Performance in Parameter-efficient Fine-tuning of Large Language Models
arXiv 2024
XHand: Real-time Expressive Hand Avatar
arXiv 2024
Osprey: Pixel Understanding with Visual Instruction Tuning
CVPR 2024 1
Affiliations
Frequent co-authors
10from 20 papers