CVE-Bench: A Benchmark for AI Agents’ Ability to Exploit Real-World Web Application Vulnerabilities
☆179Jan 14, 2026Updated 2 months ago
Alternatives and similar repositories for cve-bench
Users that are interested in cve-bench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆94Mar 6, 2026Updated 3 weeks ago
- The goal of this repo is to become a benchmark for pentesting☆22Oct 25, 2024Updated last year
- PentestAgent is a novel LLM-driven penetration testing framework to automate intelligence gathering, vulnerability analysis, and exploita…☆121Dec 20, 2025Updated 3 months ago
- [VLDB'2025] LEAP: LLM-powered End-to-end Automatic Library for Processing Social Science Queries on Unstructured Data☆19Nov 3, 2025Updated 4 months ago
- Fuzzing Automatic Differentiation in Deep-Learning Libraries (ICSE'23)☆27Mar 2, 2024Updated 2 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- ☆66Dec 8, 2025Updated 3 months ago
- A continuously updated collection of CodeLLM papers maintained by PurCL group @ Purdue☆614Jan 14, 2026Updated 2 months ago
- ☆37Nov 13, 2025Updated 4 months ago
- An autonomous LLM-agent for large-scale, repository-level code auditing☆363Mar 12, 2026Updated 2 weeks ago
- [42-b3yond-6ug] This repository hosts BugBuster, our team’s submission to the AI Cyber Challenge Final Competition.☆30Aug 19, 2025Updated 7 months ago
- Autonomous Assumed Breach Penetration-Testing Active Directory Networks☆43Updated this week
- A subset of CTF challenges I have made over the years.☆18Aug 4, 2022Updated 3 years ago
- The repository of Pentest-R1: Towards Autonomous Penetration Testing Reasoning Optimized via Two-Stage Reinforcement Learning.☆30Sep 8, 2025Updated 6 months ago
- ☆127Sep 22, 2025Updated 6 months ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Code snippets to reproduce MCP tool poisoning attacks.☆191Apr 10, 2025Updated 11 months ago
- CleanVul: Automatic Function-Level Vulnerability Detection in Code Commits Using LLM Heuristics☆20Updated this week
- CyberGym is a large-scale, high-quality cybersecurity evaluation framework designed to rigorously assess the capabilities of AI agents on…☆179Feb 23, 2026Updated last month
- Modelizer - is a framework for learning models from BlackBox systems using Input-Output examples☆22Jul 17, 2025Updated 8 months ago
- This repo contains the codes for the experiments of the paper "AutoPenBench: Benchmarking Generative Agents for Penetration Testing".☆14Oct 28, 2025Updated 5 months ago
- This repo contains the codes of the penetration test benchmark for Generative Agents presented in the paper "AutoPenBench: Benchmarking G…☆71Oct 28, 2025Updated 5 months ago
- Parsing-based Analyzer☆75Jun 8, 2025Updated 9 months ago
- Automated web vulnerability scanning with LLM agents☆458Jun 18, 2025Updated 9 months ago
- ☆57Oct 4, 2024Updated last year
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- A learning-guided approach for executing arbitrary Python code snippets☆16Mar 4, 2024Updated 2 years ago
- ☆12Nov 30, 2018Updated 7 years ago
- https://arxiv.org/abs/2412.02776☆70Dec 5, 2024Updated last year
- A manually vetted dataset for security vulnerability detection in Java projects☆94Aug 12, 2025Updated 7 months ago
- A simple Joern MCP Server.☆37Nov 14, 2025Updated 4 months ago
- Automated Benchmarking of LLM Agents on Real-World Software Security Tasks [NeurIPS 2025]☆62Jan 27, 2026Updated 2 months ago
- A benchmark for Java gadget chain detecting algorithms.☆15Jun 20, 2025Updated 9 months ago
- CodeGuard+: Constrained Decoding for Secure Code Generation☆20Jul 30, 2024Updated last year
- ☆24Jan 15, 2026Updated 2 months ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- A polyglot static analysis engine for detecting vulnerabilities in scripting languages native extensions based on joern.☆21Sep 1, 2025Updated 6 months ago
- Effective ReDoS Detection by Principled Vulnerability Modeling and Exploit Generation☆15Jul 24, 2025Updated 8 months ago
- The official repository of "GraphSPD: Graph-Based Security Patch Detection with Enriched Code Semantics". The paper will appear in the IE…☆49Aug 9, 2023Updated 2 years ago
- ☆211Dec 13, 2025Updated 3 months ago
- CAShift: Benchmarking Log-Based Cloud Attack Detection under Normality Shift (FSE 2025)☆13May 19, 2025Updated 10 months ago
- Holistic Concolic Execution for Dynamic Web Applications via Symbolic Interpreter Analysis (IEEE S&P 2024)☆15Oct 3, 2024Updated last year
- ☆128Jul 14, 2024Updated last year