CVE-Bench: A Benchmark for AI Agents’ Ability to Exploit Real-World Web Application Vulnerabilities
☆237Jan 14, 2026Updated 5 months ago
Alternatives and similar repositories for cve-bench
Users that are interested in cve-bench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆97Mar 6, 2026Updated 3 months ago
- The goal of this repo is to become a benchmark for pentesting☆23Oct 25, 2024Updated last year
- PentestAgent is a novel LLM-driven penetration testing framework to automate intelligence gathering, vulnerability analysis, and exploita…☆127Dec 20, 2025Updated 5 months ago
- Fuzzing Automatic Differentiation in Deep-Learning Libraries (ICSE'23)☆27Mar 2, 2024Updated 2 years ago
- ☆75Dec 8, 2025Updated 6 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- A continuously updated collection of papers on agentic SE maintained by PurCL group @ Purdue☆630Apr 23, 2026Updated last month
- ☆57Jul 31, 2025Updated 10 months ago
- The notes about programming language theory☆27May 7, 2023Updated 3 years ago
- ☆39Nov 13, 2025Updated 7 months ago
- An autonomous LLM-agent for large-scale, repository-level code auditing☆406Mar 12, 2026Updated 3 months ago
- [42-b3yond-6ug] This repository hosts BugBuster, our team’s submission to the AI Cyber Challenge Final Competition.☆30Aug 19, 2025Updated 9 months ago
- A subset of CTF challenges I have made over the years.☆18Aug 4, 2022Updated 3 years ago
- The repository of Pentest-R1: Towards Autonomous Penetration Testing Reasoning Optimized via Two-Stage Reinforcement Learning.☆40Sep 8, 2025Updated 9 months ago
- This is the replication package of V-SZZ, which has been accepted by ICSE2022☆15Jan 19, 2026Updated 4 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Code snippets to reproduce MCP tool poisoning attacks.☆195Apr 10, 2025Updated last year
- Ownership analysis that helps translating C to Rust☆32Apr 9, 2026Updated 2 months ago
- CleanVul: Automatic Function-Level Vulnerability Detection in Code Commits Using LLM Heuristics☆22Mar 25, 2026Updated 2 months ago
- ☆155Sep 22, 2025Updated 8 months ago
- Modelizer - is a framework for learning models from BlackBox systems using Input-Output examples☆22Apr 9, 2026Updated 2 months ago
- This repo contains the codes for the experiments of the paper "AutoPenBench: Benchmarking Generative Agents for Penetration Testing".☆16Oct 28, 2025Updated 7 months ago
- Autonomous Assumed Breach Penetration-Testing Active Directory Networks☆125May 11, 2026Updated last month
- This repo contains the codes of the penetration test benchmark for Generative Agents presented in the paper "AutoPenBench: Benchmarking G…☆86Oct 28, 2025Updated 7 months ago
- Parsing-based Analyzer☆77Jun 8, 2025Updated last year
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- Automated web vulnerability scanning with LLM agents☆470Jun 18, 2025Updated last year
- ☆12Nov 30, 2018Updated 7 years ago
- A learning-guided approach for executing arbitrary Python code snippets☆16Mar 4, 2024Updated 2 years ago
- ☆57Oct 4, 2024Updated last year
- https://arxiv.org/abs/2412.02776☆70Dec 5, 2024Updated last year
- SecLLMHolmes is a generalized, fully automated, and scalable framework to systematically evaluate the performance (i.e., accuracy and rea…☆65May 4, 2025Updated last year
- A manually vetted dataset for security vulnerability detection in Java projects☆105Aug 12, 2025Updated 10 months ago
- YASA is an open-source static program analysis project. Its core innovation lies in a unified intermediate representation called UAST, d…☆290May 7, 2026Updated last month
- Caputre the flag with Large Language Models☆35Jun 8, 2026Updated last week
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- A simple Joern MCP Server.☆44Apr 17, 2026Updated 2 months ago
- A benchmark for Java gadget chain detecting algorithms.☆16Jun 20, 2025Updated 11 months ago
- CodeGuard+: Constrained Decoding for Secure Code Generation☆22Jul 30, 2024Updated last year
- CyberGym is a large-scale, high-quality cybersecurity evaluation framework designed to rigorously assess the capabilities of AI agents on…☆418May 18, 2026Updated last month
- Effective ReDoS Detection by Principled Vulnerability Modeling and Exploit Generation☆15Jul 24, 2025Updated 10 months ago
- Automated Benchmarking of LLM Agents on Real-World Software Security Tasks [NeurIPS 2025]☆77Jan 27, 2026Updated 4 months ago
- CKGFuzzer: LLM-Based Fuzz Driver Generation Enhanced By Code Knowledge Graph☆145Feb 5, 2025Updated last year