Yanwei Li
- Papers
- 16
Cite
Notes
Only stored in your browser.
Authored papers
16Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond
arXiv 2026
Semantic Generative Tuning for Unified Multimodal Models
arXiv 2026
Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding
arXiv 2025
Seed1.5-VL Technical Report
arXiv 2025
Visual Spatial Tuning
arXiv 2025
DenseWorld-1M: Towards Detailed Dense Grounded Caption in the Real World
arXiv 2025
Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs
arXiv 2025
How Far are VLMs from Visual Spatial Intelligence? A Benchmark-Driven Perspective
arXiv 2025
Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language Models
arXiv 2025
LLaVA-OneVision: Easy Visual Task Transfer
arXiv 2024
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models
arXiv 2024
Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition
ICCV 2025
LISA: Reasoning Segmentation via Large Language Model
CVPR 2024 1
LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models
arXiv 2023
GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction
NeurIPS 2023 11
Focal Sparse Convolutional Networks for 3D Object Detection
CVPR 2022 1
Affiliations
Frequent co-authors
10from 16 papers