Bo Zhang
- Papers
- 65
Cite
Notes
Only stored in your browser.
Authored papers
65InternAgent-1.5: A Unified Agentic Framework for Long-Horizon Autonomous Scientific Discovery
arXiv 2026
Sci-CoE: Co-evolving Scientific Reasoning LLMs via Geometric Consensus with Sparse Supervision
arXiv 2026
AgentDoG: A Diagnostic Guardrail Framework for AI Agent Safety and Security
arXiv 2026
Project Imaging-X: A Survey of 1000+ Open-Access Medical Imaging Datasets for Foundation Model Development
arXiv 2026
DeepSight: An All-in-One LM Safety Toolkit
arXiv 2026
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing
arXiv 2025
Lumina-DiMOO: An Omni Diffusion Large Language Model for Multi-Modal Generation and Understanding
arXiv 2025
NovelSeek: When Agent Becomes the Scientist -- Building Closed-Loop System from Hypothesis to Verification
arXiv 2025
Lumina-Image 2.0: A Unified and Efficient Image Generative Framework
ICCV 2025
SurveyForge: On the Outline Heuristics, Memory-Driven Generation, and Multi-dimensional Evaluation for Automated Survey Writing
arXiv 2025
R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization
arXiv 2025
MotifBench: A standardized protein design benchmark for motif-scaffolding problems
arXiv 2025
OmniCaptioner: One Captioner to Rule Them All
arXiv 2025
Dolphin: Closed-loop Open-ended Auto-research through Thinking, Practice, and Feedback
arXiv 2025
GaussianDWM: 3D Gaussian Driving World Model for Unified Scene Understanding and Multi-Modal Generation
arXiv 2025
MME-Reasoning: A Comprehensive Benchmark for Logical Reasoning in MLLMs
arXiv 2025
WebWeaver: Structuring Web-Scale Evidence with Dynamic Outlines for Open-Ended Deep Research
arXiv 2025
TempFlow-GRPO: When Timing Matters for GRPO in Flow Models
arXiv 2025
OmniGen2: Exploration to Advanced Multimodal Generation
arXiv 2025
Agentar-Fin-R1: Enhancing Financial Intelligence through Domain Expertise, Training Efficiency, and Advanced Reasoning
arXiv 2025
LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis
arXiv 2025
ViRC: Enhancing Visual Interleaved Mathematical CoT with Reason Chunking
arXiv 2025
Distribution Matching Distillation Meets Reinforcement Learning
arXiv 2025
QwenLong-L1.5: Post-Training Recipe for Long-Context Reasoning and Memory Management
arXiv 2025
SciEvalKit: An Open-source Evaluation Toolkit for Scientific General Intelligence
arXiv 2025
Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows
arXiv 2025
Evaluating Intelligence via Trial and Error
arXiv 2025
TrustGeoGen: Formal-Verified Data Engine for Trustworthy Multi-modal Geometric Problem Solving
arXiv 2025
MLVU: Benchmarking Multi-task Long Video Understanding
CVPR 2025 1
OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations
CVPR 2025 1
DeepSeek-VL: Towards Real-World Vision-Language Understanding
arXiv 2024
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model
arXiv 2024
Image Over Text: Transforming Formula Recognition Evaluation with Character Detection Matching
CVPR 2025 1
OASim: an Open and Adaptive Simulator based on Neural Rendering for Autonomous Driving
arXiv 2024
Training-Free Adaptive Diffusion with Bounded Difference Approximation Strategy
arXiv 2024
DISC: Plug-and-Play Decoding Intervention with Similarity of Characters for Chinese Spelling Check
arXiv 2024
ChartX & ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning
arXiv 2024
DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models
arXiv 2024
Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models
arXiv 2024
LiDAR-PTQ: Post-Training Quantization for Point Cloud 3D Object Detection
arXiv 2024
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
arXiv 2024
VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks
arXiv 2024
GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training
arXiv 2024
Once for Both: Single Stage of Importance and Sparsity Search for Vision Transformer Compression
CVPR 2024 1
DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior
arXiv 2023
ReSimAD: Zero-Shot 3D Domain Transfer for Autonomous Driving with Source Reconstruction and Target Simulation
arXiv 2023
OWL: A Large Language Model for IT Operations
arXiv 2023
Performance-aware Approximation of Global Channel Pruning for Multitask CNNs
arXiv 2023
Lenna: Language Enhanced Reasoning Detection Assistant
arXiv 2023
YOLOv6 v3.0: A Full-Scale Reloading
arXiv 2023
Make-It-3D: High-Fidelity 3D Creation from A Single Image with Diffusion Prior
ICCV 2023 1
NaSGEC: a Multi-Domain Chinese Grammatical Error Correction Dataset from Native Speaker Texts
arXiv 2023
Foreground Object Search by Distilling Composite Image Feature
ICCV 2023 1
SUG: Single-dataset Unified Generalization for 3D Point Cloud Classification
arXiv 2023
Paint by Example: Exemplar-based Image Editing with Diffusion Models
CVPR 2023 1
Pretraining is All You Need for Image-to-Image Translation
arXiv 2022
Aspect-specific Context Modeling for Aspect-based Sentiment Analysis
arXiv 2022
Twins: Revisiting the Design of Spatial Attention in Vision Transformers
NeurIPS 2021 12
Vector Quantized Diffusion Model for Text-to-Image Synthesis
CVPR 2022 1
Making Images Real Again: A Comprehensive Survey on Deep Image Composition
arXiv 2021
Conditional Positional Encodings for Vision Transformers
arXiv 2021
OPA: Object Placement Assessment Dataset
arXiv 2021
StyleSwin: Transformer-based GAN for High-resolution Image Generation
styleswin-transformer-based-gan-for-high
Old Photo Restoration via Deep Latent Space Translation
arXiv 2020
MixPath: A Unified Approach for One-shot Neural Architecture Search
ICCV 2023 1
Affiliations
Frequent co-authors
10from 65 papers