CS-EVAL / CS-EvalLinks

CS-Eval is a comprehensive evaluation suite for fundamental cybersecurity models or large language models' cybersecurity ability.

☆54

Alternatives and similar repositories for CS-Eval

Users that are interested in CS-Eval are comparing it to the libraries listed below

Sorting:

cybermetric / CyberMetric
CyberMetric dataset
☆106Updated 9 months ago
XuanwuAI / SecEval
☆106Updated last year
tmylla / HackMentor
The repository of paper "HackMentor: Fine-Tuning Large Language Models for Cybersecurity".
☆130Updated last year
WhitzardIndex / WhitzardBench-2024A
复旦白泽大模型安全基准测试集（2024年夏季版）
☆48Updated last year
uiuc-kang-lab / cve-bench
CVE-Bench: A Benchmark for AI Agents’ Ability to Exploit Real-World Web Application Vulnerabilities
☆106Updated last week
tuhh-softsec / LLMSecEval
☆50Updated last year
llm-platform-security / SecGPT
An Execution Isolation Architecture for LLM-Based Agentic Systems
☆95Updated 9 months ago
LLM-DRA / DRA
[USENIX Security'24] Official repository of "Making Them Ask and Answer: Jailbreaking Large Language Models in Few Queries via Disguise a…
☆109Updated last year
lucagioacchini / auto-pen-bench
This repo contains the codes of the penetration test benchmark for Generative Agents presented in the paper "AutoPenBench: Benchmarking G…
☆45Updated 2 weeks ago
CSJianYang / SEevenLLM
☆36Updated last year
CLUEbenchmark / SuperCLUE-Safety
SC-Safety: 中文大模型多轮对抗安全基准
☆146Updated last year
sherdencooper / PromptFuzz
☆26Updated last year
NYU-LLM-CTF / nyuctf_agents
The D-CIPHER and NYU CTF baseline LLM Agents built for NYU CTF Bench
☆99Updated this week
rabbitjy / FuzzTuning
☆25Updated 2 years ago
sherdencooper / GPTFuzz
Official repo for GPTFUZZER : Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts
☆533Updated last year
agiresearch / ASB
Agent Security Bench (ASB)
☆137Updated 3 weeks ago
ai4cloudops / SecLLMHolmes
SecLLMHolmes is a generalized, fully automated, and scalable framework to systematically evaluate the performance (i.e., accuracy and rea…
☆60Updated 5 months ago
ddzipp / AutoAudit
AutoAudit—— the LLM for Cyber Security 网络安全大语言模型
☆352Updated 8 months ago
aseec-lab / llms-for-code-analysis
☆34Updated last year
NYU-LLM-CTF / NYU_CTF_Bench
☆94Updated last month
RainJamesY / FuzzLLM
The opensoure repository of FuzzLLM
☆30Updated last year
PurCL / ASTRA
🥇 Amazon Nova AI Challenge Winner - ASTRA emerged victorious as the top attacking team in Amazon's global AI safety competition, defeati…
☆62Updated 2 months ago
STAIR-BUPT / JailBench
JailBench：大型语言模型越狱攻击风险评测中文数据集 [PAKDD 2025]
☆135Updated 7 months ago
AIM-Intelligence / Automated-Multi-Turn-Jailbreaks
☆94Updated 11 months ago
kaichenorg / vulrule
漏洞规则库是一个致力于帮助开发者识别和避免常见安全漏洞的开源项目。我们收集、整理和分析各类编程语言和常用库中的安全漏洞模式，并提供相应的防范措施和最佳实践。
☆32Updated 2 months ago
sunblaze-ucb / cybergym
CyberGym is a large-scale, high-quality cybersecurity evaluation framework designed to rigorously assess the capabilities of AI agents on…
☆83Updated 3 weeks ago
pasquini-dario / LLMmap
☆74Updated 3 months ago
liu673 / Awesome-LLM4Security
This project aims to consolidate and share high-quality resources and tools across the cybersecurity domain.
☆266Updated 2 weeks ago
thu-coai / SafetyBench
Official github repo for SafetyBench, a comprehensive benchmark to evaluate LLMs' safety. [ACL 2024]
☆260Updated 3 months ago
s2e-lab / SecurityEval
Repository for "SecurityEval Dataset: Mining Vulnerability Examples to Evaluate Machine Learning-Based Code Generation Techniques" publis…
☆80Updated last year