CS-EVAL / CS-Eval
CS-Eval is a comprehensive evaluation suite for fundamental cybersecurity models or large language models' cybersecurity ability.
☆34Updated 2 months ago
Alternatives and similar repositories for CS-Eval:
Users that are interested in CS-Eval are comparing it to the libraries listed below
- SC-Safety: 中文大模型多轮对抗安全基准☆119Updated 11 months ago
- CyberMetric dataset☆68Updated last month
- ☆34Updated 4 months ago
- 复旦白泽大模型安全基准测试集(2024年夏季版)☆32Updated 6 months ago
- An Execution Isolation Architecture for LLM-Based Agentic Systems☆62Updated 3 weeks ago
- ☆32Updated 7 months ago
- ☆34Updated 2 weeks ago
- SecLLMHolmes is a generalized, fully automated, and scalable framework to systematically evaluate the performance (i.e., accuracy and rea…☆46Updated 3 months ago
- This is a dataset intended to train a LLM model for a completely CVE focused input and output.☆49Updated 2 months ago
- ☆52Updated 7 months ago
- The repository of paper "HackMentor: Fine-Tuning Large Language Models for Cybersecurity".☆109Updated 8 months ago
- Official github repo for SafetyBench, a comprehensive benchmark to evaluate LLMs' safety. [ACL 2024]☆190Updated 7 months ago
- Agent Security Bench (ASB)☆62Updated last week
- ☆44Updated 9 months ago
- Repository for "SecurityEval Dataset: Mining Vulnerability Examples to Evaluate Machine Learning-Based Code Generation Techniques" publis…☆58Updated last year
- ☆86Updated 10 months ago
- The D-CIPHER and NYU CTF baseline LLM Agents built for NYU CTF Bench☆49Updated 2 weeks ago
- ☆77Updated 2 months ago
- A curated list of awesome resources about LLM supply chain security (including papers, security reports and CVEs)☆38Updated last month
- ☆25Updated last year
- This project aims to consolidate and share high-quality resources and tools across the cybersecurity domain.☆129Updated last month
- 🪐 A Database of Existing Security Vulnerabilities Patches to Enable Evaluation of Techniques (single-commit; multi-language)☆37Updated 2 years ago
- Papers about red teaming LLMs and Multimodal models.☆96Updated 3 months ago
- 本文提出了一个基于“文心一言”的中国LLMs的安全评估基准,其中包括8种典型的安全场景和6种指令攻击类型。此外,本文还提出了安全评估的框架和过程,利用手动编写和收集开源数据的测试Prompts,以及人工干预结合利用LLM强大的评估能力作为“共同评估者”。☆22Updated last year
- ☆34Updated 3 months ago
- [USENIX Security'24] Official repository of "Making Them Ask and Answer: Jailbreaking Large Language Models in Few Queries via Disguise a…☆68Updated 4 months ago
- ☆18Updated 3 months ago
- [USENIX Security '24] An LLM-Assisted Easy-to-Trigger Backdoor Attack on Code Completion Models: Injecting Disguised Vulnerabilities agai…☆36Updated 3 months ago
- ☆28Updated 5 months ago
- [NDSS'25 Poster] A collection of automated evaluators for assessing jailbreak attempts.☆110Updated this week