Steven Dillmann

Cite

Notes

Only stored in your browser.

Attribution

3papers

Authored papers

Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces

arXiv 2026

SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks

arXiv 2026

ReplicationBench: Can AI Agents Replicate Astrophysics Research Papers?

arXiv 2025

No known affiliations.

from 3 papers

Xin Lan

Xuandong Zhao

Yuanli Wang

Ahson Saiyed

Akshay Anand

Alex Dimakis

Alexander G. Shaw

Andrew Lanpouthakoun

Andy Konwinski

founder

Anurag Kashyap