Botian Shi
- Papers
- 24
Cite
Notes
Only stored in your browser.
Authored papers
24The Agent's First Day: Benchmarking Learning, Exploration, and Scheduling in the Workplace Scenarios
arXiv 2026
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
arXiv 2025
OmniCaptioner: One Captioner to Rule Them All
arXiv 2025
Dolphin: Closed-loop Open-ended Auto-research through Thinking, Practice, and Feedback
arXiv 2025
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
arXiv 2025
O$^2$-Searcher: A Searching-based Agent Model for Open-Domain Open-Ended Question Answering
arXiv 2025
TrustGeoGen: Formal-Verified Data Engine for Trustworthy Multi-modal Geometric Problem Solving
arXiv 2025
OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations
CVPR 2025 1
Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond
arXiv 2024
DreamForge: Motion-Aware Autoregressive Video Generation for Multi-View Driving Scenes
arXiv 2024
OASim: an Open and Adaptive Simulator based on Neural Rendering for Autonomous Driving
arXiv 2024
Training-Free Adaptive Diffusion with Bounded Difference Approximation Strategy
arXiv 2024
DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models
arXiv 2024
GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training
arXiv 2024
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
arXiv 2024
ChartX & ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning
arXiv 2024
DetZero: Rethinking Offboard 3D Object Detection with Long-term Sequential Point Clouds
ICCV 2023 1
ReSimAD: Zero-Shot 3D Domain Transfer for Autonomous Driving with Source Reconstruction and Target Simulation
arXiv 2023
Drive Like a Human: Rethinking Autonomous Driving with Large Language Models
arXiv 2023
DiLu: A Knowledge-Driven Approach to Autonomous Driving with Large Language Models
arXiv 2023
SUG: Single-dataset Unified Generalization for 3D Point Cloud Classification
arXiv 2023
LoGoNet: Towards Accurate 3D Object Detection with Local-to-Global Cross-Modal Fusion
CVPR 2023 1
On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving
arXiv 2023
UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation
arXiv 2020
Affiliations
Frequent co-authors
10from 24 papers