0

Botian Shi

Papers
24

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
24papers

Authored papers

24

The Agent's First Day: Benchmarking Learning, Exploration, and Scheduling in the Workplace Scenarios

arXiv 2026

2026

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

arXiv 2025

2025

OmniCaptioner: One Captioner to Rule Them All

arXiv 2025

2025

Dolphin: Closed-loop Open-ended Auto-research through Thinking, Practice, and Feedback

arXiv 2025

2025

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

arXiv 2025

2025

O$^2$-Searcher: A Searching-based Agent Model for Open-Domain Open-Ended Question Answering

arXiv 2025

2025

TrustGeoGen: Formal-Verified Data Engine for Trustworthy Multi-modal Geometric Problem Solving

arXiv 2025

2025

OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations

CVPR 2025 1

2024

Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond

arXiv 2024

2024

DreamForge: Motion-Aware Autoregressive Video Generation for Multi-View Driving Scenes

arXiv 2024

2024

OASim: an Open and Adaptive Simulator based on Neural Rendering for Autonomous Driving

arXiv 2024

2024

Training-Free Adaptive Diffusion with Bounded Difference Approximation Strategy

arXiv 2024

2024

DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models

arXiv 2024

2024

GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training

arXiv 2024

2024

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

arXiv 2024

2024

ChartX & ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning

arXiv 2024

2024

DetZero: Rethinking Offboard 3D Object Detection with Long-term Sequential Point Clouds

ICCV 2023 1

2023

ReSimAD: Zero-Shot 3D Domain Transfer for Autonomous Driving with Source Reconstruction and Target Simulation

arXiv 2023

2023

Drive Like a Human: Rethinking Autonomous Driving with Large Language Models

arXiv 2023

2023

DiLu: A Knowledge-Driven Approach to Autonomous Driving with Large Language Models

arXiv 2023

2023

SUG: Single-dataset Unified Generalization for 3D Point Cloud Classification

arXiv 2023

2023

LoGoNet: Towards Accurate 3D Object Detection with Local-to-Global Cross-Modal Fusion

CVPR 2023 1

2023

On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving

arXiv 2023

2023

UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

arXiv 2020

2020

Affiliations

No known affiliations.

Frequent co-authors

10

from 24 papers