neomatrix369

Cite

Notes

Only stored in your browser.

Attribution

4tool contribs

Tool contributions

Async/concurrency output prediction — Level 2 harder-than-SWE-bench eval

Python output prediction eval: trace subtly broken code — harder-than-SWE-bench reasoning

Python output prediction eval: trace subtly broken code — harder-than-SWE-bench reasoning

API-bug fixing — pytest-style pass/fail scoring — Level 3 py-bug-trace eval

No known affiliations.