CyberGym is a large-scale, high-quality cybersecurity evaluation framework designed to rigorously assess the capabilities of AI agents on real-world vulnerability analysis tasks.
☆122Feb 23, 2026Updated 2 weeks ago
Alternatives and similar repositories for cybergym
Users that are interested in cybergym are comparing it to the libraries listed below
Sorting:
- Anonymous repo for USCHunt, a tool for detecting and classifying upgradeable proxy smart contracts, built atop Slither☆22Apr 2, 2023Updated 2 years ago
- ☆26Sep 3, 2025Updated 6 months ago
- Code for the "Overcoming Sparsity Artifacts in Crosscoders to Interpret Chat-Tuning" paper.☆16Nov 21, 2025Updated 3 months ago
- ☆92Oct 23, 2025Updated 4 months ago
- Source code for LLMxCPG paper☆121Feb 10, 2026Updated last month
- Security Vulnerability Repair via Concolic Execution and Code Mutations☆19Sep 12, 2024Updated last year
- A Unified Platform for Evaluating SAST Tools for Android☆19Mar 30, 2025Updated 11 months ago
- XNU Image Fuzzer - iOS App for Fuzzing Images with Objective-C Code covering 15 CGCreateBitmap & CGColorSpace Functions working with Raw …☆40Updated this week
- Security Harness Engineering for Robust Program Analysis☆115Jan 23, 2026Updated last month
- UQ: Assessing Language Models on Unsolved Questions☆30Aug 26, 2025Updated 6 months ago
- docker env for ios research on a mac host☆28Jun 12, 2025Updated 8 months ago
- SecLLMHolmes is a generalized, fully automated, and scalable framework to systematically evaluate the performance (i.e., accuracy and rea…☆64May 4, 2025Updated 10 months ago
- A manually vetted dataset for security vulnerability detection in Java projects☆92Aug 12, 2025Updated 6 months ago
- A Reproducible Benchmark of Recent Java Bugs☆47Aug 19, 2025Updated 6 months ago
- Parsing-based Analyzer☆71Jun 8, 2025Updated 9 months ago
- ☆22May 28, 2025Updated 9 months ago
- [ICSE'24 Industry Challenge Track] "ReposVul: A Repository-Level High-Quality Vulnerability Dataset".☆93Nov 24, 2024Updated last year
- Damn Vulnerable Browser Extension (DVBE), previously named as Badly Coded Browser Extension (BCBE), is an open-source vulnerable Chrome E…☆33Mar 4, 2025Updated last year
- ☆128Jul 14, 2024Updated last year
- ☆49Jan 14, 2025Updated last year
- The official repository of "GraphSPD: Graph-Based Security Patch Detection with Enriched Code Semantics". The paper will appear in the IE…☆49Aug 9, 2023Updated 2 years ago
- Implementation and datasets for "Training Language Models to Generate Quality Code with Program Analysis Feedback"☆42Jul 21, 2025Updated 7 months ago
- [CCS'24] An LLM-based, fully automated fuzzing tool for option combination testing.☆102Feb 10, 2026Updated last month
- ☆22Sep 26, 2023Updated 2 years ago
- ☆71Jul 24, 2025Updated 7 months ago
- ☆203Dec 13, 2025Updated 2 months ago
- Simultaneous evaluation on both functionality and security of LLM-generated code.☆32Updated this week
- SDK for building SecDim Play challenges, an open training game for AppSec, DevSecOps, CloudSec, etc.☆30Aug 7, 2025Updated 7 months ago
- 🥇 Amazon Nova AI Challenge Winner - ASTRA emerged victorious as the top attacking team in Amazon's global AI safety competition, defeati…☆70Aug 14, 2025Updated 6 months ago
- Vul4J: A Dataset of Reproducible Java Vulnerabilities☆123Sep 2, 2025Updated 6 months ago
- High-Efficiency eXpanded Coverage for Improved Testing of Executables☆25Jul 7, 2022Updated 3 years ago
- ☆26Oct 6, 2024Updated last year
- ☆25Feb 6, 2024Updated 2 years ago
- ☆29Apr 7, 2023Updated 2 years ago
- CVE-Bench: A Benchmark for AI Agents’ Ability to Exploit Real-World Web Application Vulnerabilities☆167Jan 14, 2026Updated last month
- List of Papers on Attack and Defense (AD) in AI Models☆27Mar 18, 2022Updated 3 years ago
- details about DIAL protocol vulnerabilities☆29Nov 24, 2023Updated 2 years ago
- This tool allows local LLM usage that can automate tasks without human interventention. The agent can call itself recursively and work on…☆20May 5, 2025Updated 10 months ago
- How effective are LLMs in identifying and exploiting security vulnerabilities?☆67Feb 28, 2025Updated last year