Cite
Notes
Only stored in your browser.
Attribution
SciAgentGym: Benchmarking Multi-Step Scientific Tool-use in LLM Agents
arXiv 2026
LLMEval-Logic: A Solver-Verified Chinese Benchmark for Logical Reasoning of LLMs with Adversarial Hardening
from 2 papers
Ming Zhang
Qi Zhang
Shihan Dou
Tao Gui
Xuanjing Huang
Yujiong Shen
Zhiheng Xi
Binze Hu
Huayu Sha
Jiazheng Zhang