☆235Apr 22, 2026Updated last week
Alternatives and similar repositories for cybench
Users that are interested in cybench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- The goal of this repo is to become a benchmark for pentesting☆22Oct 25, 2024Updated last year
- The repository of Pentest-R1: Towards Autonomous Penetration Testing Reasoning Optimized via Two-Stage Reinforcement Learning.☆34Sep 8, 2025Updated 7 months ago
- Useful Windows and AD tools☆15Feb 20, 2022Updated 4 years ago
- A benchmark for Java gadget chain detecting algorithms.☆16Jun 20, 2025Updated 10 months ago
- AI agent for autonomous cyber operations☆521Nov 29, 2025Updated 5 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- ☆11Dec 19, 2024Updated last year
- The D-CIPHER and NYU CTF baseline LLM Agents built for NYU CTF Bench☆141Oct 25, 2025Updated 6 months ago
- ☆81Jul 24, 2025Updated 9 months ago
- Holistic Concolic Execution for Dynamic Web Applications via Symbolic Interpreter Analysis (IEEE S&P 2024)☆16Oct 3, 2024Updated last year
- This repo contains the codes of the penetration test benchmark for Generative Agents presented in the paper "AutoPenBench: Benchmarking G…☆81Oct 28, 2025Updated 6 months ago
- ☆140Sep 22, 2025Updated 7 months ago
- A Web Platform API proposal for Blob URL☆10Feb 24, 2023Updated 3 years ago
- AIxCC: automated vulnerability repair via LLMs, search, and static analysis☆12Jul 16, 2024Updated last year
- CS-Eval is a comprehensive evaluation suite for fundamental cybersecurity models or large language models' cybersecurity ability.☆62Nov 27, 2024Updated last year
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- CyberBench: A Multi-Task Cyber LLM Benchmark☆32Apr 29, 2025Updated last year
- The Super Vulnerable Java Application (SVJA), as demonstrated in the Roniel and DaRon Podcast Show, is an Apache Struts application desig…☆13Jan 1, 2026Updated 4 months ago
- Official GitHub repository for the paper "Adversarial Attacks on Robotic Vision Language Action Models"☆33May 28, 2025Updated 11 months ago
- Execute invisible JavaScript by abusing Hangul filler characters. Inspired by Martin Kleppe's INVISIBLE.js.☆18Oct 13, 2024Updated last year
- mcp wrapper for openai built-in tools☆12Mar 13, 2025Updated last year
- Testability Tarpits: the Impact of Code Patterns on the Security Testing of Web Applications (NDSS 2022)☆27Feb 14, 2024Updated 2 years ago
- ☆66Sep 13, 2025Updated 7 months ago
- MetricEval: A framework that conceptualizes and operationalizes four main components of metric evaluation, in terms of reliability and va…☆12Nov 6, 2023Updated 2 years ago
- [NeurIPS'24, Spotlight] CTIBench: A Benchmark for Evaluating LLMs in Cyber Threat Intelligence☆83Feb 11, 2026Updated 2 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- An Inspect extension for agentic cyber evaluations☆27Apr 23, 2026Updated last week
- A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents.☆546Mar 30, 2026Updated last month
- ☆45Jan 30, 2023Updated 3 years ago
- Security Vulnerability Repair via Concolic Execution and Code Mutations☆19Sep 12, 2024Updated last year
- FUGIO: Automatic Exploit Generation for PHP Object Injection Vulnerabilities☆99Nov 27, 2023Updated 2 years ago
- [NAACL'25] "Revealing the Barriers of Language Agents in Planning"☆13Jun 22, 2025Updated 10 months ago
- WMDP is a LLM proxy benchmark for hazardous knowledge in bio, cyber, and chemical security. We also release code for RMU, an unlearning m…☆169May 29, 2025Updated 11 months ago
- CVE-Bench: A Benchmark for AI Agents’ Ability to Exploit Real-World Web Application Vulnerabilities☆208Jan 14, 2026Updated 3 months ago
- ☆13Jan 30, 2025Updated last year
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- VHTest☆16Oct 31, 2024Updated last year
- Code used to run experiments for the ICLR 2023 paper "Computational Language Acquisition with Theory of Mind".☆15Apr 27, 2023Updated 3 years ago
- An online AI security course created by UChicago's XLab☆31Feb 21, 2026Updated 2 months ago
- Code implementation for paper AbsenceBench: Language Models Can't Tell What's Missing☆19Oct 23, 2025Updated 6 months ago
- CyberMetric dataset☆121Jan 1, 2025Updated last year
- Industrial Cybersecurity Conference Index☆13Mar 11, 2024Updated 2 years ago
- [NeurIPS 2024 D&B] Evaluating Copyright Takedown Methods for Language Models☆17Jul 17, 2024Updated last year