Cite
Notes
Only stored in your browser.
Attribution
Are Generative Models Underconfident? An Embarrassingly Simple Quality Estimation Approach
arXiv 2025
SciEx: Benchmarking Large Language Models on Scientific Exams with Human Expert Grading and Automatic Grading
arXiv 2024
from 2 papers
Jan Niehues
Alexander Waibel
Carlos Mullov
Carsten Dachsbacher
Danni Liu
Fabian Ternava
Jianfeng Gao
Jueun Lee
Klemens Böhm
Leonard Bärmann