CyberGym is a large-scale, high-quality cybersecurity evaluation framework designed to rigorously assess the capabilities of AI agents on real-world vulnerability analysis tasks.
☆405May 18, 2026Updated 3 weeks ago
Alternatives and similar repositories for cybergym
Users that are interested in cybergym are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Progent: Securing AI Agents with Privilege Control☆36May 14, 2026Updated 3 weeks ago
- ☆96Mar 6, 2026Updated 3 months ago
- ☆10May 14, 2024Updated 2 years ago
- SecLLMHolmes is a generalized, fully automated, and scalable framework to systematically evaluate the performance (i.e., accuracy and rea…☆65May 4, 2025Updated last year
- Implementation and datasets for "Training Language Models to Generate Quality Code with Program Analysis Feedback"☆41Jul 21, 2025Updated 10 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Anonymous repo for USCHunt, a tool for detecting and classifying upgradeable proxy smart contracts, built atop Slither☆23Apr 2, 2023Updated 3 years ago
- Source code for LLMxCPG paper☆148Mar 26, 2026Updated 2 months ago
- ☆26Sep 3, 2025Updated 9 months ago
- Training Language Model Agents to Find Vulnerabilities with CTF-Dojo☆49Jan 10, 2026Updated 5 months ago
- Parsing-based Analyzer☆76Jun 8, 2025Updated last year
- Ghidra decompiler in your browser☆114May 4, 2026Updated last month
- Security Vulnerability Repair via Concolic Execution and Code Mutations☆21Sep 12, 2024Updated last year
- CVE-Bench: A Benchmark for AI Agents’ Ability to Exploit Real-World Web Application Vulnerabilities☆234Jan 14, 2026Updated 4 months ago
- ☆26Jan 7, 2024Updated 2 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Simultaneous evaluation on both functionality and security of LLM-generated code.☆38Mar 6, 2026Updated 3 months ago
- Security Harness Engineering for Robust Program Analysis☆133Jan 23, 2026Updated 4 months ago
- [ICSE'24 Industry Challenge Track] "ReposVul: A Repository-Level High-Quality Vulnerability Dataset".☆106Nov 24, 2024Updated last year
- A manually vetted dataset for security vulnerability detection in Java projects☆104Aug 12, 2025Updated 9 months ago
- WebSocket Penetration Testing Toolkit for Burp Suite☆30Mar 5, 2026Updated 3 months ago
- ☆130Jul 14, 2024Updated last year
- How effective are LLMs in identifying and exploiting security vulnerabilities?☆71Feb 28, 2025Updated last year
- tool of llm-based indirect-call analyzer☆31Feb 18, 2025Updated last year
- Cyber-Zero: Training Cybersecurity Agents Without Runtime☆93Feb 13, 2026Updated 3 months ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- 🥇 Amazon Nova AI Challenge Winner - ASTRA emerged victorious as the top attacking team in Amazon's global AI safety competition, defeati…☆72May 11, 2026Updated last month
- A minimal LLM-powered zero-day vulnerability scanner by AISLE.☆271Apr 14, 2026Updated last month
- Resources for our ICSE'24 poster: Prompt-Enhanced Software Vulnerability Detection Using ChatGPT.☆25May 8, 2024Updated 2 years ago
- ☆25May 28, 2025Updated last year
- A Reproducible Benchmark of Recent Java Bugs☆50Aug 19, 2025Updated 9 months ago
- ☆614Nov 25, 2025Updated 6 months ago
- [CCS'24] An LLM-based, fully automated fuzzing tool for option combination testing.☆101Feb 10, 2026Updated 4 months ago
- XNU Image Fuzzer - iOS App for Fuzzing Images with Objective-C Code covering 15 CGCreateBitmap & CGColorSpace Functions working with Raw …☆41Jun 1, 2026Updated last week
- [NDSS 2025] "CLIBE: Detecting Dynamic Backdoors in Transformer-based NLP Models"☆26Aug 20, 2025Updated 9 months ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- ExploitBench measures how far AI agents climb, from reaching vulnerable code, to triggering the bug, to building exploit primitives, to a…☆232May 16, 2026Updated 3 weeks ago
- MegaVul - The largest, high-quality, extensible, continuously updated, C/C++/Java vulnerability dataset☆150Jan 12, 2025Updated last year
- Repository for "SecurityEval Dataset: Mining Vulnerability Examples to Evaluate Machine Learning-Based Code Generation Techniques" publis…☆92Nov 4, 2023Updated 2 years ago
- ☆28Apr 28, 2023Updated 3 years ago
- ☆41Jan 13, 2023Updated 3 years ago
- docker env for ios research on a mac host☆27Jun 12, 2025Updated 11 months ago
- [NeurIPS 2024] Accelerating Greedy Coordinate Gradient and General Prompt Optimization via Probe Sampling☆35Nov 8, 2024Updated last year