Cite
Notes
Only stored in your browser.
Attribution
MultiChallenge: A Realistic Multi-Turn Conversation Evaluation Benchmark Challenging to Frontier LLMs
arXiv 2025
The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems
from 2 papers
Summer Yue
researcher
Adam Khoja
Alice Gatti
Arunim Agarwal
Brad Kenstler
Chen Xing
Cristina Menghini
Dan Hendrycks
director
Ed-Yeremai Cardona
Eduardo Trevino