sunblaze-ucb/cybergym

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/sunblaze-ucb/cybergym)

sunblaze-ucb / cybergym

CyberGym is a large-scale, high-quality cybersecurity evaluation framework designed to rigorously assess the capabilities of AI agents on real-world vulnerability analysis tasks.

☆122

Alternatives and similar repositories for cybergym

Users that are interested in cybergym are comparing it to the libraries listed below

Sorting:

USCHunt-Anon / USCHunt
View on GitHub
Anonymous repo for USCHunt, a tool for detecting and classifying upgradeable proxy smart contracts, built atop Slither
☆22Apr 2, 2023Updated 2 years ago
TransluceAI / jailbreaking-frontier-models
View on GitHub
☆26Sep 3, 2025Updated 6 months ago
science-of-finetuning / sparsity-artifacts-crosscoders
View on GitHub
Code for the "Overcoming Sparsity Artifacts in Crosscoders to Interpret Chat-Tuning" paper.
☆16Nov 21, 2025Updated 3 months ago
n132 / ARVO-Meta
View on GitHub
☆92Oct 23, 2025Updated 4 months ago
qcri / llmxcpg
View on GitHub
Source code for LLMxCPG paper
☆121Feb 10, 2026Updated last month
nus-apr / CrashRepair
View on GitHub
Security Vulnerability Repair via Concolic Execution and Code Mutations
☆19Sep 12, 2024Updated last year
android-app-sast / VulsTotal
View on GitHub
A Unified Platform for Evaluating SAST Tools for Android
☆19Mar 30, 2025Updated 11 months ago
xsscx / xnuimagefuzzer
View on GitHub
XNU Image Fuzzer - iOS App for Fuzzing Images with Objective-C Code covering 15 CGCreateBitmap & CGColorSpace Functions working with Raw …
☆40Updated this week
AIxCyberChallenge / sherpa
View on GitHub
Security Harness Engineering for Robust Program Analysis
☆115Jan 23, 2026Updated last month
uq-project / UQ
View on GitHub
UQ: Assessing Language Models on Unsolved Questions
☆30Aug 26, 2025Updated 6 months ago
chensokolovsky / iosEnv
View on GitHub
docker env for ios research on a mac host
☆28Jun 12, 2025Updated 8 months ago
ai4cloudops / SecLLMHolmes
View on GitHub
SecLLMHolmes is a generalized, fully automated, and scalable framework to systematically evaluate the performance (i.e., accuracy and rea…
☆64May 4, 2025Updated 10 months ago
iris-sast / cwe-bench-java
View on GitHub
A manually vetted dataset for security vulnerability detection in Java projects
☆92Aug 12, 2025Updated 6 months ago
gitbugactions / gitbug-java
View on GitHub
A Reproducible Benchmark of Recent Java Bugs
☆47Aug 19, 2025Updated 6 months ago
PurCL / LLMSCAN
View on GitHub
Parsing-based Analyzer
☆71Jun 8, 2025Updated 9 months ago
AI-secure / AdvAgent
View on GitHub
☆22May 28, 2025Updated 9 months ago
Eshe0922 / ReposVul
View on GitHub
[ICSE'24 Industry Challenge Track] "ReposVul: A Repository-Level High-Quality Vulnerability Dataset".
☆93Nov 24, 2024Updated last year
infosecak / DVBE
View on GitHub
Damn Vulnerable Browser Extension (DVBE), previously named as Badly Coded Browser Extension (BCBE), is an open-source vulnerable Chrome E…
☆33Mar 4, 2025Updated last year
eth-sri / sven
View on GitHub
☆128Jul 14, 2024Updated last year
InPlusLab / ReentrancyStudy-Data
View on GitHub
☆49Jan 14, 2025Updated last year
SunLab-GMU / GraphSPD
View on GitHub
The official repository of "GraphSPD: Graph-Based Security Patch Detection with Enriched Code Semantics". The paper will appear in the IE…
☆49Aug 9, 2023Updated 2 years ago
yaof20 / ReaL
View on GitHub
Implementation and datasets for "Training Language Models to Generate Quality Code with Program Analysis Feedback"
☆42Jul 21, 2025Updated 7 months ago
NASP-THU / ProphetFuzz
View on GitHub
[CCS'24] An LLM-based, fully automated fuzzing tool for option combination testing.
☆102Feb 10, 2026Updated last month
pattern-f / presentations
View on GitHub
☆22Sep 26, 2023Updated 2 years ago
SimaArasteh / binpool
View on GitHub
☆71Jul 24, 2025Updated 7 months ago
andyzorigin / cybench
View on GitHub
☆203Dec 13, 2025Updated 2 months ago
Co1lin / CWEval
View on GitHub
Simultaneous evaluation on both functionality and security of LLM-generated code.
☆32Updated this week
secdim / play-sdk
View on GitHub
SDK for building SecDim Play challenges, an open training game for AppSec, DevSecOps, CloudSec, etc.
☆30Aug 7, 2025Updated 7 months ago
PurCL / ASTRA
View on GitHub
🥇 Amazon Nova AI Challenge Winner - ASTRA emerged victorious as the top attacking team in Amazon's global AI safety competition, defeati…
☆70Aug 14, 2025Updated 6 months ago
tuhh-softsec / vul4j
View on GitHub
Vul4J: A Dataset of Reproducible Java Vulnerabilities
☆123Sep 2, 2025Updated 6 months ago
FoRTE-Research / HeXcite
View on GitHub
High-Efficiency eXpanded Coverage for Improved Testing of Executables
☆25Jul 7, 2022Updated 3 years ago
ethz-spylab / unlearning-vs-safety
View on GitHub
☆26Oct 6, 2024Updated last year
niklasrisse / LimitsOfML4Vuln
View on GitHub
☆25Feb 6, 2024Updated 2 years ago
antoniozekic / Proof-of-concepts
View on GitHub
☆29Apr 7, 2023Updated 2 years ago
uiuc-kang-lab / cve-bench
View on GitHub
CVE-Bench: A Benchmark for AI Agents’ Ability to Exploit Real-World Web Application Vulnerabilities
☆167Jan 14, 2026Updated last month
nuaa-nlp / TrustworthyAIPapers
View on GitHub
List of Papers on Attack and Defense (AD) in AI Models
☆27Mar 18, 2022Updated 3 years ago
yunuscadirci / DIALStranger
View on GitHub
details about DIAL protocol vulnerabilities
☆29Nov 24, 2023Updated 2 years ago
vishnugamini / LocalLLMAgent
View on GitHub
This tool allows local LLM usage that can automate tasks without human interventention. The agent can call itself recursively and work on…
☆20May 5, 2025Updated 10 months ago
ElectrovoltSec / HackBench
View on GitHub
How effective are LLMs in identifying and exploiting security vulnerabilities?
☆67Feb 28, 2025Updated last year