0

Zhe Chen

Papers
36

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
36papers

Authored papers

36

AgentEHR: Advancing Autonomous Clinical Decision-Making via Retrospective Summarization

arXiv 2026

2026

VisualPRM: An Effective Process Reward Model for Multimodal Reasoning

arXiv 2025

2025

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

arXiv 2025

2025

Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces

arXiv 2025

2025

VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models

arXiv 2025

2025

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

arXiv 2025

2025

MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling

arXiv 2025

2025

MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents

arXiv 2025

2025

MedReseacher-R1: Expert-Level Medical Deep Researcher via A Knowledge-Informed Trajectory Synthesis Framework

arXiv 2025

2025

Sequential Diffusion Language Models

arXiv 2025

2025

Swin-X2S: Reconstructing 3D Shape from 2D Biplanar X-ray with Swin Transformers

arXiv 2025

2025

Multi-Agent Deep Research: Training Multi-Agent Systems with M-GRPO

arXiv 2025

2025

RARE: Retrieval-Augmented Reasoning Modeling

arXiv 2025

2025

MedS$^3$: Towards Medical Small Language Models with Self-Evolved Slow Thinking

arXiv 2025

2025

Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding

arXiv 2024

2024

Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures

arXiv 2024

2024

Needle In A Multimodal Haystack

arXiv 2024

2024

MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer

arXiv 2024

2024

ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area

arXiv 2024

2024

PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models

CVPR 2025 1

2024

The All-Seeing Project V2: Towards General Relation Comprehension of the Open World

arXiv 2024

2024

VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks

arXiv 2024

2024

WHU-Synthetic: A Synthetic Perception Dataset for 3-D Multitask Model Research

arXiv 2024

2024

MedCare: Advancing Medical LLMs through Decoupling Clinical Alignment and Knowledge Aggregation

arXiv 2024

2024

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

arXiv 2024

2024

GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AI

arXiv 2024

2024

MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity

arXiv 2024

2024

MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding

arXiv 2024

2024

Bounding Box Stability against Feature Dropout Reflects Detector Generalization across Environments

arXiv 2024

2024

VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks

NeurIPS 2023 11

2023

OCHID-Fi: Occlusion-Robust Hand Pose Estimation in 3D via RF-Vision

ICCV 2023 1

2023

DDP: Diffusion Model for Dense Visual Prediction

ICCV 2023 1

2023

Traffic Flow Optimisation for Lifelong Multi-Agent Path Finding

arXiv 2023

2023

InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions

CVPR 2023 1

2022

CLAMP: Prompt-based Contrastive Learning for Connecting Language and Animal Pose

CVPR 2023 1

2022

FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation

arXiv 2021

2021

Affiliations

No known affiliations.

Frequent co-authors

10

from 36 papers