xbow-engineering/validation-benchmarks

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/xbow-engineering/validation-benchmarks)

xbow-engineering / validation-benchmarks

XBOW Validation Benchmarks

☆671

Alternatives and similar repositories for validation-benchmarks

Users that are interested in validation-benchmarks are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

passer-W / ctfSolver
View on GitHub
腾讯ai渗透黑客松参赛作品（xjtuHunter）
☆369Dec 4, 2025Updated 7 months ago
westonbrown / Cyber-AutoAgent
View on GitHub
AI agent for autonomous cyber operations
☆535Nov 29, 2025Updated 7 months ago
lucagioacchini / auto-pen-bench
View on GitHub
This repo contains the codes of the penetration test benchmark for Generative Agents presented in the paper "AutoPenBench: Benchmarking G…
☆88Oct 28, 2025Updated 8 months ago
Yeti-791 / Tsec-Hackathon
View on GitHub
腾讯云智能渗透黑客松 Official repository of Tencent Cloud Intelligent Penetration Hackathon. Showcasing top open-source projects of LLM-based auton…
☆710Updated this week
chainreactors / tinyctfer
View on GitHub
antix's baby intent runtime and meta-tooling design.
☆173Dec 29, 2025Updated 6 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
SanMuzZzZz / LuaN1aoAgent
View on GitHub
LuaN1aoAgent is a cognitive-driven, fully autonomous AI penetration testing agent powered by dual-graph reasoning. It is developed by the…
☆1,113Jul 13, 2026Updated last week
yhy0 / CHYing-agent
View on GitHub
腾讯云黑客松 - 智能渗透挑战赛第一届Top9
☆514Apr 25, 2026Updated 2 months ago
andyzorigin / cybench
View on GitHub
☆285Jul 9, 2026Updated last week
arthurgervais / mapta
View on GitHub
We present MAPTA, a multi-agent system for autonomous web application security assessment that combines large language model orchestratio…
☆105Aug 28, 2025Updated 10 months ago
oritera / Cairn
View on GitHub
A AI general-purpose state-space search engine, validated first on autonomous penetration testing.
☆2,040Updated this week
Neuro-Sploit / xbow-validation-benchmarks
View on GitHub
XBOW Validation Benchmarks
☆17Apr 4, 2026Updated 3 months ago
wgpsec / hunxiang
View on GitHub
浑象 AI agent CTF 靶场竞赛平台
☆106May 31, 2026Updated last month
sunblaze-ucb / cybergym
View on GitHub
CyberGym is a large-scale, high-quality cybersecurity evaluation framework designed to rigorously assess the capabilities of AI agents on…
☆545Jul 9, 2026Updated last week
Team-Atlanta / aixcc-afc-atlantis
View on GitHub
☆626Nov 25, 2025Updated 7 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
KeygraphHQ / xbow-validation-benchmarks
View on GitHub
☆34Mar 25, 2026Updated 3 months ago
Neuro-Sploit / tencent-cloud-hackathon-intelligent-pentest-competition-api-server
View on GitHub
☆44Dec 8, 2025Updated 7 months ago
uiuc-kang-lab / cve-bench
View on GitHub
CVE-Bench: A Benchmark for AI Agents’ Ability to Exploit Real-World Web Application Vulnerabilities
☆254Jan 14, 2026Updated 6 months ago
arthurgervais / validation-benchmarks
View on GitHub
XBOW Validation Benchmarks
☆21Aug 17, 2025Updated 11 months ago
nbshenxm / pentest-agent
View on GitHub
PentestAgent is a novel LLM-driven penetration testing framework to automate intelligence gathering, vulnerability analysis, and exploita…
☆129Dec 20, 2025Updated 7 months ago
m-sec-org / BreachWeave
View on GitHub
智能渗透Agent Manager/Observer/Solver 多角色架构，基于 pi-mono SDK。
☆436May 6, 2026Updated 2 months ago
simon-p-j-r / LLM4Pentest
View on GitHub
☆308Updated this week
aliasrobotics / cai
View on GitHub
Cybersecurity AI (CAI), the framework for AI Security
☆9,485Updated this week
NYU-LLM-CTF / nyuctf_agents
View on GitHub
The D-CIPHER and NYU CTF baseline LLM Agents built for NYU CTF Bench
☆153Updated this week
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
Dizzy-K / AutoPT
View on GitHub
[IEEE T-IFS] AutoPT: How Far Are We from the Fully Automated Web Penetration Testing?
☆46Jun 1, 2026Updated last month
m-sec-org / xbow-competition
View on GitHub
一个完整的 AI Agent 自动化 XBOW 解题方案，结合 MCP 服务器和智能 CLI 客户端，实现自主XBOW 挑战
☆58Dec 17, 2025Updated 7 months ago
KHenryAegis / VulnBot
View on GitHub
The repository of VulnBot: Autonomous Penetration Testing for A Multi-Agent Collaborative Framework.
☆180Apr 7, 2025Updated last year
PortSwigger / mcp-server
View on GitHub
MCP Server for Burp
☆995Jun 26, 2026Updated 3 weeks ago
l3yx / intentlang
View on GitHub
The next-generation AI Agent framework driven by Intent Engineering. Move beyond turn-based Function Calling to embrace code-level intent…
☆91Jan 11, 2026Updated 6 months ago
pensarai / argus-validation-benchmarks
View on GitHub
60 self-contained, Dockerized vulnerable web applications for evaluating AI-powered penetration testing agents. Covers modern tech stac…
☆50Updated this week
bountybench / bountybench
View on GitHub
☆99Jul 24, 2025Updated 11 months ago
protectai / vulnhuntr
View on GitHub
Zero shot vulnerability discovery using LLMs
☆2,713Feb 6, 2025Updated last year
GreyDGL / PentestGPT
View on GitHub
Automated Penetration Testing Agentic Framework Powered by Large Language Models
☆14,358Updated this week
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
exploitbench / exploitbench
View on GitHub
ExploitBench measures how far AI agents climb, from reaching vulnerable code, to triggering the bug, to building exploit primitives, to a…
☆303Jul 4, 2026Updated 2 weeks ago
tabby-sec / tabby
View on GitHub
A CAT called tabby ( Code Analysis Tool )
☆1,655Jan 17, 2026Updated 6 months ago
pixelindigo / yurascanner
View on GitHub
YuraScanner
☆83Feb 13, 2025Updated last year
lintsinghua / DeepAudit
View on GitHub
DeepAudit：人人拥有的 AI 黑客战队，让漏洞挖掘触手可及。国内首个开源的代码漏洞挖掘多智能体系统。小白一键部署运行，自主协作审计 + 自动化沙箱 PoC 验证。支持 Ollama 私有部署，一键生成报告。支持中转站。让安全不再昂贵，让审计不再复杂。
☆6,675Jul 7, 2026Updated last week
C1JC / AgentNote
View on GitHub
Source code for "AgentNote: OODA-Driven Autonomous Agents for Iterative Notebook-Based Problem Solving"
☆18Dec 11, 2025Updated 7 months ago
Tencent / AI-Infra-Guard
View on GitHub
A full-stack AI Red Teaming platform securing AI ecosystems via OpenClaw Security Scan, Agent Scan, Skills Scan, MCP scan, AI Infra scan …
☆4,156Updated this week
0x4m4 / hexstrike-ai
View on GitHub
HexStrike AI MCP Agents is an advanced MCP server that lets AI agents (Claude, GPT, Copilot, etc.) autonomously run 150+ cybersecurity to…
☆10,399Apr 27, 2026Updated 2 months ago