We introduce CMPhysBench, designed to assess the proficiency of Large Language Models (LLMs) in Condensed Matter Physics, as a novel Benchmark. CMPhysBench is composed of more than 520 graduate-level meticulously curated questions covering both representative subfields and foundational theoretical frameworks of condensed matter physics, such as magnetism, superconductivity, strongly correlated systems, etc. To ensure a deep understanding of the problem-solving process,we focus exclusively on calculation problems, requiring LLMs to independently generate comprehensive solutions. Meanwhile, leveraging tree-based representations of expressions, we introduce the Scalable Expression Edit Distance (SEED) score, which provides fine-grained (non-binary) partial credit and yields a more accurate assessment of similarity between prediction and ground-truth. Our results show that even the best models, Grok-4, reach only 36 average SEED score and 28% accuracy on CMPhysBench, underscoring a significant capability gap, especially for this practical and frontier domain relative to traditional physics. The code anddataset are publicly available at https://github.com/CMPhysBench/CMPhysBench.
CMPhysBench: A Benchmark for Evaluating Large Language Models in Condensed Matter Physics
CMPhysBench evaluates LLMs in condensed matter physics using calculation problems and a new SEED score for partial credit assessment, revealing significant capability gaps.
- Year
- 2025
- Venue
- arXiv 2025
- Authors
- 35
- Hosting
- Abstract onlyARXIV-DEFAULT
Cite
Notes
Only stored in your browser.
Attribution
- Abstract & full text
- arxiv.org/abs/2508.18124v3ARXIV-DEFAULT
- TL;DR
- Semantic Scholar
Abstract
Authors
35Lei BaiYuqiang LiDongzhan ZhouWanli OuyangDi ZhangShufei ZhangJin ZengXin LiMao SuJiatong LiWeida WangZeke XieZhihao DouQianJia ChengDongchen HuangTengchao YangZiyang ZhengDong HanBenteng ChenBinzhao LuoZhiyu LiuKunling LiuZhiyuan GaoShiqi GengWei MaJiaming SuShuchen PuYuhan ShuiDongfei CuiChangyong HeYunqi CaiXi DaiJinguang ChengZhong FangHongming Weng