0

CVEBench: Benchmark for AI Agents Ability to Exploit Real-World Web Application Vulnerabilities

Active

Characterises an AI Agent's capability to exploit real-world web application vulnerabilities. Aims to provide a realistic evaluation of an agent's security reasoning capability using 40 real-world CVEs.

Domain
Cybersecurity
License
mit
Published
Nov 2025
Notable for
Benchmark for evaluating Cybersecurity.

Cite

Notes

Only stored in your browser.

FAQ

What is CVEBench: Benchmark for AI Agents Ability to Exploit Real-World Web Application Vulnerabilities?
Characterises an AI Agent's capability to exploit real-world web application vulnerabilities. Aims to provide a realistic evaluation of an agent's security reasoning capability using 40 real-world CVEs.
What license is CVEBench: Benchmark for AI Agents Ability to Exploit Real-World Web Application Vulnerabilities under?
CVEBench: Benchmark for AI Agents Ability to Exploit Real-World Web Application Vulnerabilities is available under mit.