TUD-ARTS-2023 / LLM-red-teaming-promptsLinks
LLM red teaming datasets from the paper 'Student-Teacher Prompting for Red Teaming to Improve Guardrails' for the ART of Safety Workshop 2023
☆18Updated 2 years ago
Alternatives and similar repositories for LLM-red-teaming-prompts
Users that are interested in LLM-red-teaming-prompts are comparing it to the libraries listed below
Sorting:
- autoredteam: code for training models that automatically red team other language models☆15Updated 2 years ago
- Run safety benchmarks against AI models and view detailed reports showing how well they performed.☆114Updated this week
- Official repository for the paper "ALERT: A Comprehensive Benchmark for Assessing Large Language Models’ Safety through Red Teaming"☆51Updated last year
- S-Eval: Towards Automated and Comprehensive Safety Evaluation for Large Language Models☆106Updated 2 months ago
- [ICML 2025] Official repository for paper "OR-Bench: An Over-Refusal Benchmark for Large Language Models"☆19Updated 9 months ago
- Test LLMs against jailbreaks and unprecedented harms☆36Updated last year
- Ferret: Faster and Effective Automated Red Teaming with Reward-Based Scoring Technique☆18Updated last year
- The opensoure repository of FuzzLLM☆34Updated last year
- PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to a…☆443Updated last year
- Synthesizing realistic and diverse text-datasets from augmented LLMs☆16Updated 8 months ago
- Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMs☆299Updated last year
- R-Judge: Benchmarking Safety Risk Awareness for LLM Agents (EMNLP Findings 2024)☆93Updated 7 months ago
- Papers about red teaming LLMs and Multimodal models.☆158Updated 6 months ago
- ☆49Updated last year
- ☆40Updated last month
- Contains all assets to run with Moonshot Library (Connectors, Datasets and Metrics)☆39Updated this week
- [ICLR 2024]Data for "Multilingual Jailbreak Challenges in Large Language Models"☆96Updated last year
- Codes and datasets of the paper Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment☆109Updated last year
- LLM evaluation.☆16Updated 2 years ago
- Package to optimize Adversarial Attacks against (Large) Language Models with Varied Objectives☆70Updated last year
- CyberMetric dataset☆110Updated 11 months ago
- ☆191Updated 2 years ago
- AIR-Bench 2024 is a safety benchmark that aligns with emerging government regulations and company policies☆28Updated last year
- ☆38Updated last year
- Agent Security Bench (ASB)☆157Updated last month
- [NeurIPS 2021] "Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models" by Boxin Wang*, Chejian Xu*, Shuoh…☆13Updated 2 years ago
- ☆48Updated last year
- Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators (Liu et al.; COLM 2024)☆48Updated 11 months ago
- Code and data of the EMNLP 2022 paper "Why Should Adversarial Perturbations be Imperceptible? Rethink the Research Paradigm in Adversaria…☆67Updated 2 years ago
- Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs☆100Updated last year