Min Dou
- Papers
- 13
Cite
Notes
Only stored in your browser.
Authored papers
13InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
arXiv 2025
InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling
arXiv 2025
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
arXiv 2025
Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond
arXiv 2024
DreamForge: Motion-Aware Autoregressive Video Generation for Multi-View Driving Scenes
arXiv 2024
OASim: an Open and Adaptive Simulator based on Neural Rendering for Autonomous Driving
arXiv 2024
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
arXiv 2024
ChartX & ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning
arXiv 2024
DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models
arXiv 2024
ReSimAD: Zero-Shot 3D Domain Transfer for Autonomous Driving with Source Reconstruction and Target Simulation
arXiv 2023
Drive Like a Human: Rethinking Autonomous Driving with Large Language Models
arXiv 2023
DiLu: A Knowledge-Driven Approach to Autonomous Driving with Large Language Models
arXiv 2023
On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving
arXiv 2023
Affiliations
Frequent co-authors
10from 13 papers