shreyansh26/Red-Teaming-Language-Models-with-Language-Models

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/shreyansh26/Red-Teaming-Language-Models-with-Language-Models)

shreyansh26 / Red-Teaming-Language-Models-with-Language-Models

A re-implementation of the "Red Teaming Language Models with Language Models" paper by Perez et al., 2022

☆35

Alternatives and similar repositories for Red-Teaming-Language-Models-with-Language-Models

Users that are interested in Red-Teaming-Language-Models-with-Language-Models are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

jc-ryan / holistic_automated_red_teaming
View on GitHub
[EMNLP 2024] Holistic Automated Red Teaming for Large Language Models through Top-Down Test Case Generation and Multi-turn Interaction
☆17Nov 9, 2024Updated last year
shizhouxing / LLM-Detector-Robustness
View on GitHub
[TACL] Code for "Red Teaming Language Model Detectors with Language Models"
☆24Nov 24, 2023Updated 2 years ago
ZhangZhuoSJTU / LINT
View on GitHub
☆17Sep 4, 2024Updated last year
Improbable-AI / curiosity_redteam
View on GitHub
Official implementation of ICLR'24 paper, "Curiosity-driven Red Teaming for Large Language Models" (https://openreview.net/pdf?id=4KqkizX…
☆90Mar 15, 2024Updated 2 years ago
declare-lab / ferret
View on GitHub
Ferret: Faster and Effective Automated Red Teaming with Reward-Based Scoring Technique
☆19Aug 22, 2024Updated last year
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
shreyansh26 / An-Empirical-Model-of-Large-Batch-Training
View on GitHub
An approximate implementation of the OpenAI paper - An Empirical Model of Large-Batch Training for MNIST
☆11Nov 19, 2022Updated 3 years ago
thestephencasper / explore_establish_exploit_llms
View on GitHub
☆31Jul 14, 2023Updated 3 years ago
SheltonLiu-N / AutoDAN
View on GitHub
[ICLR 2024] The official implementation of our ICLR2024 paper "AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language M…
☆453Jan 22, 2025Updated last year
Algorithmic-Alignment-Lab / CommonClaim
View on GitHub
Explore, Establish, Exploit: Red Teaming Language Models from Scratch
☆15Jun 21, 2023Updated 3 years ago
ejones313 / auditing-llms
View on GitHub
☆61Mar 9, 2023Updated 3 years ago
uiuc-arc / llm-code-watermark
View on GitHub
LLM Program Watermarking
☆19Apr 19, 2024Updated 2 years ago
aliasgharkhani / Masktune
View on GitHub
☆29Jan 16, 2023Updated 3 years ago
ZZZhr-1 / Robust_GUI_Grounding
View on GitHub
On the Robustness of GUI Grounding Models Against Image Attacks
☆12Apr 8, 2025Updated last year
azshue / AutoPoison
View on GitHub
The official repository of the paper "On the Exploitability of Instruction Tuning".
☆70Feb 5, 2024Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
claws-lab / casper
View on GitHub
Code and data for the ACM CIKM 2022 paper "Rank List Sensitivity of Recommender Systems to Interaction Perturbations"
☆10Aug 16, 2022Updated 3 years ago
sejoonoh / ATR
View on GitHub
Code and data for the ACM CIKM 2024 paper "Adversarial Text Rewriting for Text-aware Recommender Systems"
☆12Aug 1, 2024Updated last year
shreyansh26 / Attention-Mask-Patterns
View on GitHub
Using FlexAttention to compute attention with different masking patterns
☆47Sep 22, 2024Updated last year
xuxiong0214 / BTIDBF
View on GitHub
☆18Feb 25, 2024Updated 2 years ago
yiwiy9 / atcoder-rust-devcontainer
View on GitHub
This is a repository that provides a development environment for Rust programming language, pre-configured with tools and settings optimi…
☆17Apr 11, 2026Updated 3 months ago
MtSomeThree / constrDecoding
View on GitHub
Constrained Decoding Project
☆20Nov 10, 2023Updated 2 years ago
ssagawa / overparam_spur_corr
View on GitHub
An Investigation of Why Overparameterization Exacerbates Spurious Correlations
☆30Jul 12, 2020Updated 6 years ago
pinghsieh / FSAF
View on GitHub
☆11Oct 25, 2021Updated 4 years ago
facebookresearch / text-adversarial-attack
View on GitHub
Repo for arXiv preprint "Gradient-based Adversarial Attacks against Text Transformers"
☆113Dec 28, 2022Updated 3 years ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
LLM-QC / AdversariaLLM
View on GitHub
Toolbox to run adversarial attacks against LLM.
☆31Jun 10, 2026Updated last month
CogComp / illinois-sl
View on GitHub
A general-purpose Java library for performing structured learning.
☆23Jul 5, 2022Updated 4 years ago
sherdencooper / GPTFuzz
View on GitHub
Official repo for GPTFUZZER : Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts
☆601Feb 27, 2026Updated 4 months ago
suchithnarayan / ai-skill-scanner
View on GitHub
AI-powered security scanner for Claude Code plugins and skills - LLM analysis, static rules, taint tracking, CI/PR integration, and inte…
☆15Mar 1, 2026Updated 4 months ago
shreyansh26 / ML-Optimizers-JAX
View on GitHub
Toy implementations of some popular ML optimizers using Python/JAX
☆44Jun 20, 2021Updated 5 years ago
chujiezheng / LLM-Extrapolation
View on GitHub
Official repository for ACL 2025 paper "Model Extrapolation Expedites Alignment"
☆75May 20, 2025Updated last year
PlusLabNLP / Active-IT
View on GitHub
Code for our EMNLP-2023 paper: "Active Instruction Tuning: Improving Cross-Task Generalization by Training on Prompt Sensitive Tasks"
☆26Nov 16, 2023Updated 2 years ago
eth-sri / watermark-stealing
View on GitHub
Watermark Stealing in Large Language Models (ICML '24)
☆30Jun 24, 2024Updated 2 years ago
kost / rdpcmd-ruby
View on GitHub
Run commands over RDP on massive number of hosts
☆11Nov 26, 2018Updated 7 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
papersPapers / BadPrompt
View on GitHub
Code for the paper "BadPrompt: Backdoor Attacks on Continuous Prompts"
☆41Jul 8, 2024Updated 2 years ago
GraySwanAI / ipi_arena_os
View on GitHub
☆43Mar 18, 2026Updated 4 months ago
Princeton-SysML / Jailbreak_LLM
View on GitHub
☆203Nov 26, 2023Updated 2 years ago
sguo28 / DROP_Simulator
View on GitHub
☆11Jun 6, 2024Updated 2 years ago
XuandongZhao / weak-to-strong
View on GitHub
[ICML 2025] Weak-to-Strong Jailbreaking on Large Language Models
☆90May 2, 2025Updated last year
yunglau / QGFN
View on GitHub
QGFN: Controllable Greediness with Action Values - Code
☆11May 17, 2024Updated 2 years ago
declare-lab / red-instruct
View on GitHub
Codes and datasets of the paper Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment
☆111Mar 8, 2024Updated 2 years ago