Cite
Notes
Only stored in your browser.
Attribution
QEDBENCH: Quantifying the Alignment Gap in Automated Evaluation of University-Level Mathematical Proofs
arXiv 2026
CodeApex: A Bilingual Programming Evaluation Benchmark for Large Language Models
arXiv 2023
from 2 papers
Alan Goldfarb
Alireza Amiri Bavandpour
Ana Mickovic
Andres Miniguano-Trujillo
Annika Kanckos
Antoine Moulin
Ariane M. Masuda
Arman Cohan
Connor Stewart
Detchat Samart