Cite
Notes
Only stored in your browser.
Attribution
Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces
arXiv 2026
SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks
ReplicationBench: Can AI Agents Replicate Astrophysics Research Papers?
arXiv 2025
from 3 papers
Xin Lan
Xuandong Zhao
Yuanli Wang
Ahson Saiyed
Akshay Anand
Alex Dimakis
Alexander G. Shaw
Andrew Lanpouthakoun
Andy Konwinski
founder
Anurag Kashyap