CVEBench: Benchmark for AI Agents Ability to Exploit Real-World Web Application Vulnerabilities
Active
Characterises an AI Agent's capability to exploit real-world web application vulnerabilities. Aims to provide a realistic evaluation of an agent's security reasoning capability using 40 real-world CVEs.
- Domain
- Cybersecurity
- License
- mit
- Published
- Nov 2025
- Notable for
- Benchmark for evaluating Cybersecurity.
Cite
Notes
Only stored in your browser.
FAQ
- What is CVEBench: Benchmark for AI Agents Ability to Exploit Real-World Web Application Vulnerabilities?
- Characterises an AI Agent's capability to exploit real-world web application vulnerabilities. Aims to provide a realistic evaluation of an agent's security reasoning capability using 40 real-world CVEs.
- What license is CVEBench: Benchmark for AI Agents Ability to Exploit Real-World Web Application Vulnerabilities under?
- CVEBench: Benchmark for AI Agents Ability to Exploit Real-World Web Application Vulnerabilities is available under mit.