Cite
Notes
Only stored in your browser.
Attribution
AutoResearchBench: Benchmarking AI Agents on Complex Scientific Literature Discovery
arXiv 2026
Beyond Multiple Choice: Verifiable OpenQA for Robust Vision-Language RFT
arXiv 2025
Do Vision-Language Models Measure Up? Benchmarking Visual Measurement Reading with MeasureBench
from 3 papers
Xi Yang
Yesheng Liu
Chen Yue
Hao Li
JG Yao
Mingxuan Zhao
Zheqi He
Baoqi Pei
Bowen Qin
Fenfen Lin