☆271Apr 22, 2026Updated 2 months ago
Alternatives and similar repositories for cybench
Users that are interested in cybench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- The goal of this repo is to become a benchmark for pentesting☆23Oct 25, 2024Updated last year
- The repository of Pentest-R1: Towards Autonomous Penetration Testing Reasoning Optimized via Two-Stage Reinforcement Learning.☆41Sep 8, 2025Updated 9 months ago
- ☆11Dec 19, 2024Updated last year
- The D-CIPHER and NYU CTF baseline LLM Agents built for NYU CTF Bench☆151Oct 25, 2025Updated 8 months ago
- ☆304Updated this week
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Benchmark evaluation code for "SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal" (ICLR 2025)☆83Mar 1, 2025Updated last year
- Holistic Concolic Execution for Dynamic Web Applications via Symbolic Interpreter Analysis (IEEE S&P 2024)☆17Oct 3, 2024Updated last year
- This repo contains the codes of the penetration test benchmark for Generative Agents presented in the paper "AutoPenBench: Benchmarking G…☆87Oct 28, 2025Updated 8 months ago
- ☆157Sep 22, 2025Updated 9 months ago
- AIxCC: automated vulnerability repair via LLMs, search, and static analysis☆13Jul 16, 2024Updated last year
- CS-Eval is a comprehensive evaluation suite for fundamental cybersecurity models or large language models' cybersecurity ability.☆65Nov 27, 2024Updated last year
- XBOW Validation Benchmarks☆644Jun 18, 2025Updated last year
- ☆30Jun 19, 2023Updated 3 years ago
- CyberBench: A Multi-Task Cyber LLM Benchmark☆35Apr 29, 2025Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- A Web Platform API proposal for Blob URL☆11Feb 24, 2023Updated 3 years ago
- 软件工程与形式化方法相关前沿工作阅读与分享☆36Oct 27, 2025Updated 8 months ago
- Public Source code Release of Theori's AIxCC AFC Submission☆272Aug 5, 2025Updated 10 months ago
- ☆66Sep 13, 2025Updated 9 months ago
- [NeurIPS'24, Spotlight] CTIBench: A Benchmark for Evaluating LLMs in Cyber Threat Intelligence☆87May 7, 2026Updated last month
- A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents.☆640Jun 2, 2026Updated last month
- ☆45Jan 30, 2023Updated 3 years ago
- Security Vulnerability Repair via Concolic Execution and Code Mutations☆21Sep 12, 2024Updated last year
- FUGIO: Automatic Exploit Generation for PHP Object Injection Vulnerabilities☆99Nov 27, 2023Updated 2 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs. Empirical tricks for LLM Jailbreaking. (NeurIPS 2024)☆163Nov 30, 2024Updated last year
- CVE-Bench: A Benchmark for AI Agents’ Ability to Exploit Real-World Web Application Vulnerabilities☆245Jan 14, 2026Updated 5 months ago
- An Inspect extension for agentic cyber evaluations☆31Jun 18, 2026Updated 2 weeks ago
- LLM agent solving traces, leaderboards, and benchmark results across security CTF and hacking platforms☆75Jun 22, 2026Updated last week
- ☆13Jan 30, 2025Updated last year
- VHTest☆16Oct 31, 2024Updated last year
- Code used to run experiments for the ICLR 2023 paper "Computational Language Acquisition with Theory of Mind".☆15Apr 27, 2023Updated 3 years ago
- An online AI security course created by UChicago's XLab☆37Feb 21, 2026Updated 4 months ago
- ☆10Dec 4, 2024Updated last year
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Code implementation for paper AbsenceBench: Language Models Can't Tell What's Missing☆19Oct 23, 2025Updated 8 months ago
- CyberMetric dataset☆126May 27, 2026Updated last month
- AlgZoo: uninterpreted models with fewer than 1,500 parameters☆48Jan 19, 2026Updated 5 months ago
- Industrial Cybersecurity Conference Index☆13Mar 11, 2024Updated 2 years ago
- [NeurIPS 2024 D&B] Evaluating Copyright Takedown Methods for Language Models☆17Jul 17, 2024Updated last year
- ☆11Oct 13, 2020Updated 5 years ago
- ☆27Oct 6, 2024Updated last year