Cite
Notes
Only stored in your browser.
Attribution
Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces
arXiv 2026
Precise Debugging Benchmark: Is Your Model Debugging or Regenerating?
Scaling Inference-Efficient Language Models
arXiv 2025
from 3 papers
Minghao Yan
Ahson Saiyed
Akshay Anand
Alex Dimakis
Alexander G. Shaw
Andrew Lanpouthakoun
Andy Konwinski
founder
Anurag Kashyap
Arinbjörn Kolbeinsson
Bardia Koopah