0

CyberGym: Evaluating AI Agents' Real-World Cybersecurity Capabilities at Scale

Active

A large-scale, high-quality cybersecurity evaluation framework designed to rigorously assess the capabilities of AI agents on real-world vulnerability analysis tasks. CyberGym includes 1,507 benchmark instances with historical vulnerabilities from 188 large software projects.

Domain
Cybersecurity
License
mit
Published
Feb 2026
Notable for
Benchmark for evaluating Cybersecurity.

Cite

Notes

Only stored in your browser.

FAQ

What is CyberGym: Evaluating AI Agents' Real-World Cybersecurity Capabilities at Scale?
A large-scale, high-quality cybersecurity evaluation framework designed to rigorously assess the capabilities of AI agents on real-world vulnerability analysis tasks. CyberGym includes 1,507 benchmark instances with historical vulnerabilities from 188 large software projects.
What license is CyberGym: Evaluating AI Agents' Real-World Cybersecurity Capabilities at Scale under?
CyberGym: Evaluating AI Agents' Real-World Cybersecurity Capabilities at Scale is available under mit.