Benyou Wang
- Papers
- 60
Cite
Notes
Only stored in your browser.
Authored papers
60MicroVerse: A Preliminary Exploration Toward a Micro-World Simulation
arXiv 2026
Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-World Workflows
arXiv 2026
LiveClin: A Live Clinical Benchmark without Leakage
arXiv 2026
Cut Your Losses! Learning to Prune Paths Early for Efficient Parallel Reasoning
arXiv 2026
Project Imaging-X: A Survey of 1000+ Open-Access Medical Imaging Datasets for Foundation Model Development
arXiv 2026
Do Phone-Use Agents Respect Your Privacy?
arXiv 2026
ClinAlign: Scaling Healthcare Alignment from Clinician Preference
arXiv 2026
A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers
arXiv 2025
ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation
arXiv 2025
FusionAudio-1.2M: Towards Fine-grained Audio Captioning with Multimodal Contextual Fusion
arXiv 2025
TalkVid: A Large-Scale Diversified Dataset for Audio-Driven Talking Head Synthesis
arXiv 2025
MedGen: Unlocking Medical Video Generation by Scaling Granularly-annotated Medical Videos
arXiv 2025
Video-R1: Reinforcing Video Reasoning in MLLMs
arXiv 2025
Can Multimodal LLMs See Materials Clearly? A Multimodal Benchmark on Materials Characterization
arXiv 2025
Large Language Models for Outpatient Referral: Problem Definition, Benchmarking and Challenges
arXiv 2025
Learning from Peers in Reasoning Models
arXiv 2025
S2S-Arena, Evaluating Speech2Speech Protocols on Instruction Following with Paralinguistic Information
arXiv 2025
CoRT: Code-integrated Reasoning within Thinking
arXiv 2025
EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs
arXiv 2025
Soundwave: Less is More for Speech-Text Alignment in LLMs
arXiv 2025
TwinMarket: A Scalable Behavioral and Social Simulation for Financial Markets
arXiv 2025
RealCritic: Towards Effectiveness-Driven Evaluation of Language Model Critiques
arXiv 2025
QFFT, Question-Free Fine-Tuning for Adaptive Reasoning
arXiv 2025
BlenderLLM: Training Large Language Models for Computer-Aided Design with Self-improvement
arXiv 2024
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
arXiv 2024
HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale
arXiv 2024
ORLM: A Customizable Framework in Training Large Models for Automated Optimization Modeling
arXiv 2024
CoD, Towards an Interpretable Medical Agent using Chain of Diagnosis
arXiv 2024
Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models
arXiv 2024
MotionLLM: Understanding Human Behaviors from Human Motions and Videos
arXiv 2024
Efficiently Democratizing Medical LLMs for 50 Languages via a Mixture of Language Family Experts
arXiv 2024
Less is More: A Simple yet Effective Token Reduction Method for Efficient Multi-modal LLMs
arXiv 2024
ALLaVA: Harnessing GPT4V-Synthesized Data for Lite Vision-Language Models
arXiv 2024
LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via a Hybrid Architecture
arXiv 2024
No Language is an Island: Unifying Chinese and English in Financial Large Language Models, Instruction Data, and Benchmarks
arXiv 2024
RAG-Instruct: Boosting LLMs with Diverse Retrieval-Augmented Instructions
arXiv 2024
On the Compositional Generalization of Multimodal LLMs for Medical Imaging
arXiv 2024
Both Text and Images Leaked! A Systematic Analysis of Multimodal LLM Data Contamination
arXiv 2024
Mamo: a Mathematical Modeling Benchmark with Solvers
arXiv 2024
LLMs Could Autonomously Learn Without External Supervision
arXiv 2024
Is Your LLM Outdated? Evaluating LLMs at Temporal Generalization
arXiv 2024
UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models
arXiv 2024
Humans or LLMs as the Judge? A Study on Judgement Biases
arXiv 2024
AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information?
arXiv 2024
Apollo: A Lightweight Multilingual Medical LLM towards Democratizing Medical AI to 6B People
arXiv 2024
Mixture of Latent Experts Using Tensor Products
arXiv 2024
GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI
arXiv 2024
Rethinking The Uniformity Metric in Self-Supervised Learning
arXiv 2024
CMB: A Comprehensive Medical Benchmark in Chinese
arXiv 2023
Huatuo-26M, a Large-scale Chinese Medical QA Dataset
arXiv 2023
HuatuoGPT, towards Taming Language Model to Be a Doctor
arXiv 2023
Phoenix: Democratizing ChatGPT across Languages
arXiv 2023
Natural Language Reasoning, A Survey
arXiv 2023
OVM, Outcome-supervised Value Models for Planning in Mathematical Reasoning
arXiv 2023
Lifting the Curse of Capacity Gap in Distilling Language Models
arXiv 2023
HuatuoGPT-II, One-stage Training for Medical Adaption of LLMs
arXiv 2023
AceGPT, Localizing Large Language Models in Arabic
arXiv 2023
Towards Unifying Medical Vision-and-Language Pre-training via Soft Prompts
ICCV 2023 1
Word Grounded Graph Convolutional Network
arXiv 2023
DPTDR: Deep Prompt Tuning for Dense Passage Retrieval
COLING 2022 10
Affiliations
Frequent co-authors
10from 60 papers