0

ClinicalGPT-R1: Pushing reasoning capability of generalist disease diagnosis with large language model

ClinicalGPT-R1, a reasoning-enhanced LLM trained on real-world clinical records, outperforms GPT-4 in Chinese and matches it in English for disease diagnosis tasks.

Year
2025
Venue
arXiv 2025
Authors
8
Hosting
Abstract onlyARXIV-DEFAULT

Cite

Notes

Only stored in your browser.

Attribution

Abstract & full text
arxiv.org/abs/2504.09421v2ARXIV-DEFAULT
TL;DR
Semantic Scholar
Attribution policy →

Abstract

Recent advances in reasoning with large language models (LLMs)has shown remarkable reasoning capabilities in domains such as mathematics and coding, yet their application to clinical diagnosis remains underexplored. Here, we introduce ClinicalGPT-R1, a reasoning enhanced generalist large language model for disease diagnosis. Trained on a dataset of 20,000 real-world clinical records, ClinicalGPT-R1 leverages diverse training strategies to enhance diagnostic reasoning. To benchmark performance, we curated MedBench-Hard, a challenging dataset spanning seven major medical specialties and representative diseases. Experimental results demonstrate that ClinicalGPT-R1 outperforms GPT-4o in Chinese diagnostic tasks and achieves comparable performance to GPT-4 in English settings. This comparative study effectively validates the superior performance of ClinicalGPT-R1 in disease diagnosis tasks. Resources are available at https://github.com/medfound/medfound.

Authors

8