Wentong Li
- Papers
- 10
Cite
Notes
Only stored in your browser.
Authored papers
10InstructSAM: Segment Any Instance with Any Instructions
arXiv 2026
VisionTrim: Unified Vision Token Compression for Training-Free MLLM Acceleration
arXiv 2026
Inst3D-LMM: Instance-Aware 3D Scene Understanding with Multi-modal Instruction Tuning
CVPR 2025 1
Forging Spatial Intelligence: A Roadmap of Multi-Modal Data Pre-Training for Autonomous Systems
arXiv 2025
EOC-Bench: Can MLLMs Identify, Recall, and Forecast Objects in an Egocentric World?
arXiv 2025
Uncertainty-Instructed Structure Injection for Generalizable HD Map Construction
CVPR 2025 1
TokenPacker: Efficient Visual Projector for Multimodal LLM
arXiv 2024
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM
CVPR 2025 1
Osprey: Pixel Understanding with Visual Instruction Tuning
CVPR 2024 1
H2RBox: Horizontal Box Annotation is All You Need for Oriented Object Detection
arXiv 2022
Affiliations
Frequent co-authors
10from 10 papers