Benyou Wang

Project Imaging-X: A Survey of 1000+ Open-Access Medical Imaging Datasets for Foundation Model Development

arXiv 2026

MicroVerse: A Preliminary Exploration Toward a Micro-World Simulation

arXiv 2026

Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-World Workflows

arXiv 2026

LiveClin: A Live Clinical Benchmark without Leakage

arXiv 2026

Cut Your Losses! Learning to Prune Paths Early for Efficient Parallel Reasoning

arXiv 2026

ClinAlign: Scaling Healthcare Alignment from Clinician Preference

arXiv 2026

A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers

arXiv 2025

ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation

arXiv 2025

Video-R1: Reinforcing Video Reasoning in MLLMs

arXiv 2025

Soundwave: Less is More for Speech-Text Alignment in LLMs

arXiv 2025

TwinMarket: A Scalable Behavioral and Social Simulation for Financial Markets

arXiv 2025

Can Multimodal LLMs See Materials Clearly? A Multimodal Benchmark on Materials Characterization

arXiv 2025

Large Language Models for Outpatient Referral: Problem Definition, Benchmarking and Challenges

arXiv 2025

QFFT, Question-Free Fine-Tuning for Adaptive Reasoning

arXiv 2025

FusionAudio-1.2M: Towards Fine-grained Audio Captioning with Multimodal Contextual Fusion

arXiv 2025

Learning from Peers in Reasoning Models

arXiv 2025

S2S-Arena, Evaluating Speech2Speech Protocols on Instruction Following with Paralinguistic Information

arXiv 2025

TalkVid: A Large-Scale Diversified Dataset for Audio-Driven Talking Head Synthesis

arXiv 2025

MedGen: Unlocking Medical Video Generation by Scaling Granularly-annotated Medical Videos

arXiv 2025

CoRT: Code-integrated Reasoning within Thinking

arXiv 2025

EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs

arXiv 2025

RealCritic: Towards Effectiveness-Driven Evaluation of Language Model Critiques

arXiv 2025

BlenderLLM: Training Large Language Models for Computer-Aided Design with Self-improvement

arXiv 2024

HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs

arXiv 2024

HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale

arXiv 2024

ORLM: A Customizable Framework in Training Large Models for Automated Optimization Modeling

arXiv 2024

CoD, Towards an Interpretable Medical Agent using Chain of Diagnosis

arXiv 2024

Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models

arXiv 2024

MotionLLM: Understanding Human Behaviors from Human Motions and Videos

arXiv 2024

Efficiently Democratizing Medical LLMs for 50 Languages via a Mixture of Language Family Experts

arXiv 2024

Less is More: A Simple yet Effective Token Reduction Method for Efficient Multi-modal LLMs

arXiv 2024

ALLaVA: Harnessing GPT4V-Synthesized Data for Lite Vision-Language Models

arXiv 2024

LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via a Hybrid Architecture

arXiv 2024

No Language is an Island: Unifying Chinese and English in Financial Large Language Models, Instruction Data, and Benchmarks

arXiv 2024

Apollo: A Lightweight Multilingual Medical LLM towards Democratizing Medical AI to 6B People

arXiv 2024

RAG-Instruct: Boosting LLMs with Diverse Retrieval-Augmented Instructions

arXiv 2024

Mixture of Latent Experts Using Tensor Products

arXiv 2024

GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI

arXiv 2024

On the Compositional Generalization of Multimodal LLMs for Medical Imaging

arXiv 2024

Humans or LLMs as the Judge? A Study on Judgement Biases

arXiv 2024

AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information?

arXiv 2024

Both Text and Images Leaked! A Systematic Analysis of Multimodal LLM Data Contamination

arXiv 2024

Mamo: a Mathematical Modeling Benchmark with Solvers

arXiv 2024

LLMs Could Autonomously Learn Without External Supervision

arXiv 2024

Rethinking The Uniformity Metric in Self-Supervised Learning

arXiv 2024

Is Your LLM Outdated? Evaluating LLMs at Temporal Generalization

arXiv 2024

UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models

arXiv 2024

CMB: A Comprehensive Medical Benchmark in Chinese

arXiv 2023

Huatuo-26M, a Large-scale Chinese Medical QA Dataset

arXiv 2023

HuatuoGPT, towards Taming Language Model to Be a Doctor

arXiv 2023

HuatuoGPT-II, One-stage Training for Medical Adaption of LLMs

arXiv 2023

Natural Language Reasoning, A Survey

arXiv 2023

AceGPT, Localizing Large Language Models in Arabic

arXiv 2023

Towards Unifying Medical Vision-and-Language Pre-training via Soft Prompts

ICCV 2023 1

OVM, Outcome-supervised Value Models for Planning in Mathematical Reasoning

arXiv 2023

Lifting the Curse of Capacity Gap in Distilling Language Models

arXiv 2023

Word Grounded Graph Convolutional Network

arXiv 2023

Phoenix: Democratizing ChatGPT across Languages

arXiv 2023