TUD-ARTS-2023 / LLM-red-teaming-promptsLinks

LLM red teaming datasets from the paper 'Student-Teacher Prompting for Red Teaming to Improve Guardrails' for the ART of Safety Workshop 2023

☆14

Alternatives and similar repositories for LLM-red-teaming-prompts

Users that are interested in LLM-red-teaming-prompts are comparing it to the libraries listed below

Sorting:

mlcommons / modelbench
Run safety benchmarks against AI models and view detailed reports showing how well they performed.
☆104Updated last week
Babelscape / ALERT
Official repository for the paper "ALERT: A Comprehensive Benchmark for Assessing Large Language Models’ Safety through Red Teaming"
☆46Updated last year
IS2Lab / S-Eval
S-Eval: Towards Automated and Comprehensive Safety Evaluation for Large Language Models
☆96Updated 2 months ago
RainJamesY / FuzzLLM
The opensoure repository of FuzzLLM
☆29Updated last year
walledai / walledeval
Test LLMs against jailbreaks and unprecedented harms
☆35Updated 11 months ago
declare-lab / red-instruct
Codes and datasets of the paper Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment
☆105Updated last year
allenai / wildteaming
☆34Updated last year
cybermetric / CyberMetric
CyberMetric dataset
☆103Updated 8 months ago
Aatrox103 / SAP
☆46Updated last year
agiresearch / ASB
Agent Security Bench (ASB)
☆121Updated 3 months ago
Libr-AI / do-not-answer
Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMs
☆289Updated last year
haizelabs / redteaming-resistance-benchmark
☆46Updated last year
Lordog / R-Judge
R-Judge: Benchmarking Safety Risk Awareness for LLM Agents (EMNLP Findings 2024)
☆88Updated 4 months ago
amazon-science / synthesizrr
Synthesizing realistic and diverse text-datasets from augmented LLMs
☆15Updated 5 months ago
googleinterns / localizing-paragraph-memorization
☆14Updated last year
AI-secure / AgentPoison
[NeurIPS 2024] Official implementation for "AgentPoison: Red-teaming LLM Agents via Memory or Knowledge Base Backdoor Poisoning"
☆155Updated 5 months ago
DAMO-NLP-SG / multilingual-safety-for-LLMs
[ICLR 2024]Data for "Multilingual Jailbreak Challenges in Large Language Models"
☆83Updated last year
Libr-AI / OpenRedTeaming
Papers about red teaming LLMs and Multimodal models.
☆140Updated 3 months ago
agencyenterprise / PromptInject
PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to a…
☆424Updated last year
thunlp / Advbench
Code and data of the EMNLP 2022 paper "Why Should Adversarial Perturbations be Imperceptible? Rethink the Research Paradigm in Adversaria…
☆56Updated 2 years ago
SheltonLiu-N / Universal-Prompt-Injection
The official implementation of our pre-print paper "Automatic and Universal Prompt Injection Attacks against Large Language Models".
☆58Updated 11 months ago
amazon-science / llm-hallucinations-factual-qa
☆13Updated 7 months ago
GodXuxilie / PromptAttack
An LLM can Fool Itself: A Prompt-Based Adversarial Attack (ICLR 2024)
☆99Updated 8 months ago
declare-lab / ferret
Ferret: Faster and Effective Automated Red Teaming with Reward-Based Scoring Technique
☆18Updated last year
AI-secure / adversarial-glue
[NeurIPS 2021] "Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models" by Boxin Wang*, Chejian Xu*, Shuoh…
☆12Updated 2 years ago
microsoft / RTP-LX
Repository for the paper "RTP-LX: Can LLMs Evaluate Toxicity in Multilingual Scenarios?"
☆24Updated 4 months ago
thu-coai / LongSafety
[ACL 2025] LongSafety: Evaluating Long-Context Safety of Large Language Models
☆14Updated 3 months ago
justincui03 / or-bench
[ICML 2025] Official repository for paper "OR-Bench: An Over-Refusal Benchmark for Large Language Models"
☆15Updated 6 months ago
AI-secure / DecodingTrust
A Comprehensive Assessment of Trustworthiness in GPT Models
☆303Updated last year
allenai / wildguard
Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs
☆91Updated 9 months ago