Cite
Notes
Only stored in your browser.
Attribution
General Agent Evaluation
arXiv 2026
CLEAR: Error Analysis via LLM-as-a-Judge Made Easy
arXiv 2025
Do These LLM Benchmarks Agree? Fixing Benchmark Evaluation with BenchBench
arXiv 2024
Unitxt: Flexible, Shareable and Reusable Data Preparation and Evaluation for Generative AI
from 4 papers
Michal Shmueli-Scheuer
Asaf Yehudai
Elron Bandel
Leshem Choshen
Ariel Gera
Elad Venezian
Lilach Eden
Ofir Arviv
Yoav Katz
Dafna Sheinwald