☆196Dec 13, 2025Updated 2 months ago
Alternatives and similar repositories for cybench
Users that are interested in cybench are comparing it to the libraries listed below
Sorting:
- The goal of this repo is to become a benchmark for pentesting☆19Oct 25, 2024Updated last year
- ☆11Dec 19, 2024Updated last year
- A benchmark for Java gadget chain detecting algorithms.☆15Jun 20, 2025Updated 8 months ago
- ☆229Updated this week
- Constructing community of LLM-based Agent in the minecraft☆16Nov 27, 2025Updated 3 months ago
- Useful Windows and AD tools☆15Feb 20, 2022Updated 4 years ago
- The D-CIPHER and NYU CTF baseline LLM Agents built for NYU CTF Bench☆131Oct 25, 2025Updated 4 months ago
- Testability Tarpits: the Impact of Code Patterns on the Security Testing of Web Applications (NDSS 2022)☆27Feb 14, 2024Updated 2 years ago
- Benchmark evaluation code for "SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal" (ICLR 2025)☆76Mar 1, 2025Updated last year
- This repo contains the codes of the penetration test benchmark for Generative Agents presented in the paper "AutoPenBench: Benchmarking G…☆64Oct 28, 2025Updated 4 months ago
- 软件工程与形式化方法相关前沿工作阅读与分享☆36Oct 27, 2025Updated 4 months ago
- ☆118Sep 22, 2025Updated 5 months ago
- AI agent for autonomous cyber operations☆487Nov 29, 2025Updated 3 months ago
- The repository of Pentest-R1: Towards Autonomous Penetration Testing Reasoning Optimized via Two-Stage Reinforcement Learning.☆27Sep 8, 2025Updated 5 months ago
- ☆81Feb 11, 2026Updated 2 weeks ago
- Extracts IoCs, TTPs and the relationships between them. Outputs a STIX 2.1 bundle.☆79Feb 4, 2026Updated 3 weeks ago
- ☆66Sep 13, 2025Updated 5 months ago
- mcp wrapper for openai built-in tools☆12Mar 13, 2025Updated 11 months ago
- TARS: MinMax Token-Adaptive Preference Strategy for Hallucination Reduction in MLLMs☆23Sep 21, 2025Updated 5 months ago
- AIxCC: automated vulnerability repair via LLMs, search, and static analysis☆11Jul 16, 2024Updated last year
- [NAACL'25] "Revealing the Barriers of Language Agents in Planning"☆13Jun 22, 2025Updated 8 months ago
- ☆10Nov 24, 2018Updated 7 years ago
- The Super Vulnerable Java Application (SVJA), as demonstrated in the Roniel and DaRon Podcast Show, is an Apache Struts application desig…☆13Jan 1, 2026Updated 2 months ago
- XBOW Validation Benchmarks☆495Jun 18, 2025Updated 8 months ago
- ☆27Oct 6, 2024Updated last year
- WMDP is a LLM proxy benchmark for hazardous knowledge in bio, cyber, and chemical security. We also release code for RMU, an unlearning m…☆160May 29, 2025Updated 9 months ago
- FUGIO: Automatic Exploit Generation for PHP Object Injection Vulnerabilities☆98Nov 27, 2023Updated 2 years ago
- CyberGym is a large-scale, high-quality cybersecurity evaluation framework designed to rigorously assess the capabilities of AI agents on…☆118Feb 23, 2026Updated last week
- ☆27Feb 19, 2024Updated 2 years ago
- ☆11Oct 13, 2020Updated 5 years ago
- MetricEval: A framework that conceptualizes and operationalizes four main components of metric evaluation, in terms of reliability and va…☆12Nov 6, 2023Updated 2 years ago
- The rev.ng demos☆13Jan 29, 2026Updated last month
- ☆10Feb 16, 2025Updated last year
- ☆13Mar 22, 2024Updated last year
- interactive command line interfaces for Python☆13Jan 3, 2021Updated 5 years ago
- The code and datasets of our ACM MM 2024 paper "Hallu-PI: Evaluating Hallucination in Multi-modal Large Language Models within Perturbed …☆11Sep 27, 2024Updated last year
- ☆23Jul 24, 2024Updated last year
- ☆53Sep 5, 2024Updated last year
- ☆31May 1, 2025Updated 10 months ago