andyzorigin/cybench

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/andyzorigin/cybench)

andyzorigin / cybench

☆288

Alternatives and similar repositories for cybench

Users that are interested in cybench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

bountybench / bountybench
View on GitHub
☆100Jul 24, 2025Updated last year
uiuc-kang-lab / cve-bench
View on GitHub
CVE-Bench: A Benchmark for AI Agents’ Ability to Exploit Real-World Web Application Vulnerabilities
☆259Jan 14, 2026Updated 6 months ago
NYU-LLM-CTF / NYU_CTF_Bench
View on GitHub
☆162Sep 22, 2025Updated 10 months ago
lucagioacchini / auto-pen-bench
View on GitHub
This repo contains the codes of the penetration test benchmark for Generative Agents presented in the paper "AutoPenBench: Benchmarking G…
☆90Oct 28, 2025Updated 8 months ago
sunblaze-ucb / cybergym
View on GitHub
CyberGym is a large-scale, high-quality cybersecurity evaluation framework designed to rigorously assess the capabilities of AI agents on…
☆574Jul 9, 2026Updated 2 weeks ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
amazon-science / CTF-Dojo
View on GitHub
Training Language Model Agents to Find Vulnerabilities with CTF-Dojo
☆56Jan 10, 2026Updated 6 months ago
xbow-engineering / validation-benchmarks
View on GitHub
XBOW Validation Benchmarks
☆675Jul 7, 2026Updated 2 weeks ago
isamu-isozaki / AI-Pentest-Benchmark
View on GitHub
The goal of this repo is to become a benchmark for pentesting
☆24Oct 25, 2024Updated last year
amazon-science / Cyber-Zero
View on GitHub
Cyber-Zero: Training Cybersecurity Agents Without Runtime
☆100Feb 13, 2026Updated 5 months ago
NYU-LLM-CTF / nyuctf_agents
View on GitHub
The D-CIPHER and NYU CTF baseline LLM Agents built for NYU CTF Bench
☆154Jul 17, 2026Updated last week
usnistgov / caisi-cyber-evals
View on GitHub
☆17Jan 6, 2026Updated 6 months ago
sunblaze-ucb / cybergym-e2e
View on GitHub
CyberGym-E2E is a large-scale benchmark built from real-world vulnerabilities in widely used open-source projects to evaluate AI agents' …
☆29Jun 25, 2026Updated last month
UKGovernmentBEIS / inspect_cyber
View on GitHub
An Inspect extension for agentic cyber evaluations
☆38Jun 18, 2026Updated last month
RyuKosei / PACEbench
View on GitHub
☆35Oct 14, 2025Updated 9 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
simon-p-j-r / LLM4Pentest
View on GitHub
☆317Jul 17, 2026Updated last week
KHenryAegis / Pentest-R1
View on GitHub
The repository of Pentest-R1: Towards Autonomous Penetration Testing Reasoning Optimized via Two-Stage Reinforcement Learning.
☆45Sep 8, 2025Updated 10 months ago
UKGovernmentBEIS / inspect_evals
View on GitHub
Collection of evals for Inspect AI
☆602Updated this week
jpmorganchase / CyberBench
View on GitHub
CyberBench: A Multi-Task Cyber LLM Benchmark
☆35Apr 29, 2025Updated last year
SORRY-Bench / sorry-bench
View on GitHub
Benchmark evaluation code for "SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal" (ICLR 2025)
☆83Mar 1, 2025Updated last year
pwncollege / ctf-archive
View on GitHub
This is a comprehensive collection of challenges from past CTF competitions. The challenges are stored with REHOST details and can be run…
☆106Jun 22, 2026Updated last month
ChrisTimperley / RepairChain
View on GitHub
AIxCC: automated vulnerability repair via LLMs, search, and static analysis
☆13Jul 16, 2024Updated 2 years ago
ethz-spylab / superhuman-ai-consistency
View on GitHub
☆30Jun 19, 2023Updated 3 years ago
theori-io / aixcc-afc-archive
View on GitHub
Public Source code Release of Theori's AIxCC AFC Submission
☆270Aug 5, 2025Updated 11 months ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
tmylla / Awesome-LLM4Cybersecurity
View on GitHub
An overview of LLMs for cybersecurity.
☆1,720Jul 8, 2026Updated 2 weeks ago
r4wd3r / ADPWN
View on GitHub
Useful Windows and AD tools
☆15Feb 20, 2022Updated 4 years ago
arthurgervais / mapta
View on GitHub
We present MAPTA, a multi-agent system for autonomous web application security assessment that combines large language model orchestratio…
☆105Aug 28, 2025Updated 10 months ago
exploitbench / exploitbench
View on GitHub
ExploitBench measures how far AI agents climb, from reaching vulnerable code, to triggering the bug, to building exploit primitives, to a…
☆314Jul 4, 2026Updated 3 weeks ago
arthurgervais / validation-benchmarks
View on GitHub
XBOW Validation Benchmarks
☆21Aug 17, 2025Updated 11 months ago
sunblaze-ucb / exploitgym
View on GitHub
ExploitGym is a large-scale, realistic benchmark built from real-world vulnerabilities designed to evaluate AI agents' ability to develop…
☆445Updated this week
Team-Atlanta / aixcc-afc-atlantis
View on GitHub
☆627Nov 25, 2025Updated 8 months ago
0ca / BoxPwnr
View on GitHub
A modular framework for benchmarking LLMs and agentic strategies on security challenges across HackTheBox, TryHackMe, PortSwigger Labs, C…
☆434Updated this week
cyb3rlab / PenGym
View on GitHub
PenGym: Pentesting Training Framework for Reinforcement Learning Agents
☆59Dec 19, 2024Updated last year
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
secbench-git / SecBench
View on GitHub
SecBench: A Comprehensive Multi-Dimensional Benchmarking Dataset for LLMs in Cybersecurity
☆19Jan 8, 2025Updated last year
apartresearch / DarkBench
View on GitHub
Benchmarking Dark Patterns in LLMs (ICLR 2025)
☆18Mar 29, 2025Updated last year
andreashappe / cochise
View on GitHub
Autonomous Assumed Breach Penetration-Testing Active Directory Networks
☆129Jun 17, 2026Updated last month
CS-EVAL / CS-Eval
View on GitHub
CS-Eval is a comprehensive evaluation suite for fundamental cybersecurity models or large language models' cybersecurity ability.
☆65Nov 27, 2024Updated last year
google / acjs
View on GitHub
☆11Dec 19, 2024Updated last year
ForAllSecure / GraphFuzz
View on GitHub
GraphFuzz is an experimental framework for building structure-aware, library API fuzzers.
☆10Apr 21, 2022Updated 4 years ago
AIxCyberChallenge / sherpa
View on GitHub
Security Harness Engineering for Robust Program Analysis
☆138Jan 23, 2026Updated 6 months ago