Papers about red teaming LLMs and Multimodal models.
☆160May 28, 2025Updated 9 months ago
Alternatives and similar repositories for OpenRedTeaming
Users that are interested in OpenRedTeaming are comparing it to the libraries listed below
Sorting:
- ☆23May 20, 2025Updated 9 months ago
- ☆25Mar 16, 2025Updated 11 months ago
- MCPSecBench: A Systematic Security Benchmark and Playground for Testing Model Context Protocols☆30Sep 24, 2025Updated 5 months ago
- Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks [ICLR 2025]☆377Jan 23, 2025Updated last year
- An easy-to-use Python framework to generate adversarial jailbreak prompts.☆820Mar 27, 2025Updated 11 months ago
- HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal☆864Aug 16, 2024Updated last year
- A repo for generating random NFTs with metadata 100% on chain!☆37Mar 8, 2024Updated last year
- [ICLR 2024] The official implementation of our ICLR2024 paper "AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language M…☆430Jan 22, 2025Updated last year
- We jailbreak GPT-3.5 Turbo’s safety guardrails by fine-tuning it on only 10 adversarially designed examples, at a cost of less than $0.20…☆341Feb 23, 2024Updated 2 years ago
- Q&A dataset for many-shot jailbreaking☆14Jul 19, 2024Updated last year
- ☆25Sep 3, 2025Updated 6 months ago
- Component Services Volatile Environment LPE☆12Jun 28, 2025Updated 8 months ago
- 🤖🛡️🔍🔒🔑 Tiny package designed to support red teams and penetration testers in exploiting large language model AI solutions.☆26May 16, 2024Updated last year
- ☆122Feb 3, 2025Updated last year
- one-time use token phishing toolkit☆12May 30, 2020Updated 5 years ago
- The respository describing a novel datasets for word association explanations☆13Sep 21, 2023Updated 2 years ago
- Multi-Layer Sparse Autoencoders (ICLR 2025)☆29Feb 6, 2026Updated last month
- A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).☆1,870Feb 23, 2026Updated last week
- ☆164Sep 2, 2024Updated last year
- ☆698Jul 2, 2025Updated 8 months ago
- Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs. Empirical tricks for LLM Jailbreaking. (NeurIPS 2024)☆163Nov 30, 2024Updated last year
- ☆28Sep 21, 2024Updated last year
- ☆12Oct 23, 2022Updated 3 years ago
- AIBOM Workshop RSA 2024☆15May 20, 2024Updated last year
- [NAACL 2025] The official implementation of paper "Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language M…☆28Mar 14, 2024Updated last year
- A prompt injection game to collect data for robust ML research☆68Jan 27, 2025Updated last year
- Winterfell hunt is a python script to perform auto threat hunting for malicious activities in windows OS based on collected data by winte…☆15Jul 23, 2020Updated 5 years ago
- [NDSS'25 Best Technical Poster] A collection of automated evaluators for assessing jailbreak attempts.☆187Apr 1, 2025Updated 11 months ago
- [ACL 2024] Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization☆29Jul 9, 2024Updated last year
- A library for red-teaming LLM applications with LLMs.☆29Oct 11, 2024Updated last year
- FeatureAlignment = Alignment + Mechanistic Interpretability☆34Mar 8, 2025Updated 11 months ago
- Röttger et al. (2025): "MSTS: A Multimodal Safety Test Suite for Vision-Language Models"☆16Mar 31, 2025Updated 11 months ago
- ☆12May 6, 2024Updated last year
- HackAgent is an open-source security toolkit to detect vulnerabilities of your AI Agents☆37Updated this week
- small language models training made easy☆13Dec 15, 2024Updated last year
- Top 10 for Agentic AI (AI Agent Security) serves as the core for OWASP and CSA Red teaming work☆172Oct 7, 2025Updated 4 months ago
- A curated list of safety-related papers, articles, and resources focused on Large Language Models (LLMs). This repository aims to provide…☆1,783Feb 1, 2026Updated last month
- Papers and resources related to the security and privacy of LLMs 🤖☆568Jun 8, 2025Updated 8 months ago
- Official github repo for ACLUE, an evaluation benchmark focused on ancient Chinese language comprehension☆33Mar 20, 2024Updated last year