ScholarSearch
Fresh
ScholarSearch is designed to evaluate the complex information retrieval capabilities of Large Language Models (LLMs) in academic research.
- Type
- RL Env
- Publisher
- General Reasoning
- Runtime
ORS- License
- unknown
- Size
- 223 tasks
- Published
- Jan 2026
Cite
Notes
Only stored in your browser.
Public scores on this env
66 vf-eval reports across 6 models
1GPT 4o Search PreviewOpenAI19.052DeepSeek R1DeepSeek12.383GPT 4o Mini Search PreviewOpenAI10.484GPT-4.1OpenAI8.575GPT 4oOpenAI5.716GPT-4o-miniOpenAI3.81
Open the scoring view →