CS-EVAL / CS-Eval
CS-Eval is a comprehensive evaluation suite for fundamental cybersecurity models or large language models' cybersecurity ability.
☆39Updated 4 months ago
Alternatives and similar repositories for CS-Eval:
Users that are interested in CS-Eval are comparing it to the libraries listed below
- Agent Security Bench (ASB)☆75Updated 3 weeks ago
- ☆33Updated 9 months ago
- The D-CIPHER and NYU CTF baseline LLM Agents built for NYU CTF Bench☆65Updated 2 weeks ago
- An Execution Isolation Architecture for LLM-Based Agentic Systems☆70Updated 2 months ago
- CyberMetric dataset☆80Updated 3 months ago
- The repository of paper "HackMentor: Fine-Tuning Large Language Models for Cybersecurity".☆116Updated 10 months ago
- 复旦白泽大模型安全基准测试集(2024年夏季版)☆36Updated 8 months ago
- [USENIX Security'24] Official repository of "Making Them Ask and Answer: Jailbreaking Large Language Models in Few Queries via Disguise a…☆86Updated 6 months ago
- ☆46Updated last month
- ☆93Updated last year
- ☆25Updated last year
- ☆59Updated 5 months ago
- SC-Safety: 中文大模型多轮对抗安全基准☆132Updated last year
- ☆38Updated 6 months ago
- This project aims to consolidate and share high-quality resources and tools across the cybersecurity domain.☆185Updated 3 months ago
- ☆93Updated last month
- The opensoure repository of FuzzLLM☆25Updated 11 months ago
- ☆33Updated 6 months ago
- ☆59Updated 9 months ago
- Repository for "SecurityEval Dataset: Mining Vulnerability Examples to Evaluate Machine Learning-Based Code Generation Techniques" publis…☆67Updated last year
- ☆44Updated 11 months ago
- Awesome Large Language Models for Vulnerability Detection☆62Updated this week
- Official github repo for SafetyBench, a comprehensive benchmark to evaluate LLMs' safety. [ACL 2024]☆212Updated 10 months ago
- ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors [EMNLP 2024 Findings]☆185Updated 6 months ago
- A curated list of awesome resources about LLM supply chain security (including papers, security reports and CVEs)☆66Updated 3 months ago
- Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs. Empirical tricks for LLM Jailbreaking. (NeurIPS 2024)☆135Updated 4 months ago
- ☆67Updated last month
- This repo contains the codes of the penetration test benchmark for Generative Agents presented in the paper "AutoPenBench: Benchmarking G…☆26Updated 6 months ago
- Benchmark data from the article "AutoPT: How Far Are We from End2End Automated Web Penetration Testing?"☆13Updated 5 months ago
- The official implementation of our pre-print paper "Automatic and Universal Prompt Injection Attacks against Large Language Models".☆45Updated 6 months ago