0

Benyou Wang

Papers
60

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
60papers

Authored papers

60

MicroVerse: A Preliminary Exploration Toward a Micro-World Simulation

arXiv 2026

2026

Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-World Workflows

arXiv 2026

2026

LiveClin: A Live Clinical Benchmark without Leakage

arXiv 2026

2026

Cut Your Losses! Learning to Prune Paths Early for Efficient Parallel Reasoning

arXiv 2026

2026

Project Imaging-X: A Survey of 1000+ Open-Access Medical Imaging Datasets for Foundation Model Development

arXiv 2026

2026

Do Phone-Use Agents Respect Your Privacy?

arXiv 2026

2026

ClinAlign: Scaling Healthcare Alignment from Clinician Preference

arXiv 2026

2026

A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers

arXiv 2025

2025

ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation

arXiv 2025

2025

FusionAudio-1.2M: Towards Fine-grained Audio Captioning with Multimodal Contextual Fusion

arXiv 2025

2025

TalkVid: A Large-Scale Diversified Dataset for Audio-Driven Talking Head Synthesis

arXiv 2025

2025

MedGen: Unlocking Medical Video Generation by Scaling Granularly-annotated Medical Videos

arXiv 2025

2025

Video-R1: Reinforcing Video Reasoning in MLLMs

arXiv 2025

2025

Can Multimodal LLMs See Materials Clearly? A Multimodal Benchmark on Materials Characterization

arXiv 2025

2025

Large Language Models for Outpatient Referral: Problem Definition, Benchmarking and Challenges

arXiv 2025

2025

Learning from Peers in Reasoning Models

arXiv 2025

2025

S2S-Arena, Evaluating Speech2Speech Protocols on Instruction Following with Paralinguistic Information

arXiv 2025

2025

CoRT: Code-integrated Reasoning within Thinking

arXiv 2025

2025

EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs

arXiv 2025

2025

Soundwave: Less is More for Speech-Text Alignment in LLMs

arXiv 2025

2025

TwinMarket: A Scalable Behavioral and Social Simulation for Financial Markets

arXiv 2025

2025

RealCritic: Towards Effectiveness-Driven Evaluation of Language Model Critiques

arXiv 2025

2025

QFFT, Question-Free Fine-Tuning for Adaptive Reasoning

arXiv 2025

2025

BlenderLLM: Training Large Language Models for Computer-Aided Design with Self-improvement

arXiv 2024

2024

HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs

arXiv 2024

2024

HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale

arXiv 2024

2024

ORLM: A Customizable Framework in Training Large Models for Automated Optimization Modeling

arXiv 2024

2024

CoD, Towards an Interpretable Medical Agent using Chain of Diagnosis

arXiv 2024

2024

Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models

arXiv 2024

2024

MotionLLM: Understanding Human Behaviors from Human Motions and Videos

arXiv 2024

2024

Efficiently Democratizing Medical LLMs for 50 Languages via a Mixture of Language Family Experts

arXiv 2024

2024

Less is More: A Simple yet Effective Token Reduction Method for Efficient Multi-modal LLMs

arXiv 2024

2024

ALLaVA: Harnessing GPT4V-Synthesized Data for Lite Vision-Language Models

arXiv 2024

2024

LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via a Hybrid Architecture

arXiv 2024

2024

No Language is an Island: Unifying Chinese and English in Financial Large Language Models, Instruction Data, and Benchmarks

arXiv 2024

2024

RAG-Instruct: Boosting LLMs with Diverse Retrieval-Augmented Instructions

arXiv 2024

2024

On the Compositional Generalization of Multimodal LLMs for Medical Imaging

arXiv 2024

2024

Both Text and Images Leaked! A Systematic Analysis of Multimodal LLM Data Contamination

arXiv 2024

2024

Mamo: a Mathematical Modeling Benchmark with Solvers

arXiv 2024

2024

LLMs Could Autonomously Learn Without External Supervision

arXiv 2024

2024

Is Your LLM Outdated? Evaluating LLMs at Temporal Generalization

arXiv 2024

2024

UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models

arXiv 2024

2024

Humans or LLMs as the Judge? A Study on Judgement Biases

arXiv 2024

2024

AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information?

arXiv 2024

2024

Apollo: A Lightweight Multilingual Medical LLM towards Democratizing Medical AI to 6B People

arXiv 2024

2024

Mixture of Latent Experts Using Tensor Products

arXiv 2024

2024

GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI

arXiv 2024

2024

Rethinking The Uniformity Metric in Self-Supervised Learning

arXiv 2024

2024

CMB: A Comprehensive Medical Benchmark in Chinese

arXiv 2023

2023

Huatuo-26M, a Large-scale Chinese Medical QA Dataset

arXiv 2023

2023

HuatuoGPT, towards Taming Language Model to Be a Doctor

arXiv 2023

2023

Phoenix: Democratizing ChatGPT across Languages

arXiv 2023

2023

Natural Language Reasoning, A Survey

arXiv 2023

2023

OVM, Outcome-supervised Value Models for Planning in Mathematical Reasoning

arXiv 2023

2023

Lifting the Curse of Capacity Gap in Distilling Language Models

arXiv 2023

2023

HuatuoGPT-II, One-stage Training for Medical Adaption of LLMs

arXiv 2023

2023

AceGPT, Localizing Large Language Models in Arabic

arXiv 2023

2023

Towards Unifying Medical Vision-and-Language Pre-training via Soft Prompts

ICCV 2023 1

2023

Word Grounded Graph Convolutional Network

arXiv 2023

2023

DPTDR: Deep Prompt Tuning for Dense Passage Retrieval

COLING 2022 10

2022

Affiliations

No known affiliations.

Frequent co-authors

10

from 60 papers