SafeAILab/RAIN

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/SafeAILab/RAIN)

SafeAILab / RAIN

[ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning

☆99

Alternatives and similar repositories for RAIN

Users that are interested in RAIN are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

uw-nsl / SafeDecoding
View on GitHub
Official Repository for ACL 2024 Paper SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding
☆154Jul 19, 2024Updated 2 years ago
alphadl / SafeLLM_with_IntentionAnalysis
View on GitHub
Towards Safe LLM with our simple-yet-highly-effective Intention Analysis Prompting
☆21Mar 25, 2024Updated 2 years ago
poloclub / llm-self-defense
View on GitHub
LLM Self Defense: By Self Examination, LLMs know they are being tricked
☆52May 21, 2024Updated 2 years ago
SheltonLiu-N / AutoDAN
View on GitHub
[ICLR 2024] The official implementation of our ICLR2024 paper "AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language M…
☆453Jan 22, 2025Updated last year
yjw1029 / Self-Reminder
View on GitHub
Code for our paper "Defending ChatGPT against Jailbreak Attack via Self-Reminder" in NMI.
☆57Nov 13, 2023Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
arobey1 / smooth-llm
View on GitHub
☆135Nov 13, 2023Updated 2 years ago
neelsjain / baseline-defenses
View on GitHub
Official Code for "Baseline Defenses for Adversarial Attacks Against Aligned Language Models"
☆34Oct 26, 2023Updated 2 years ago
Princeton-SysML / Jailbreak_LLM
View on GitHub
☆203Nov 26, 2023Updated 2 years ago
YancyKahn / CoA
View on GitHub
Chain of Attack: a Semantic-Driven Contextual Multi-Turn attacker for LLM
☆39Jan 17, 2025Updated last year
thu-coai / JailbreakDefense_GoalPriority
View on GitHub
[ACL 2024] Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization
☆29Jul 9, 2024Updated 2 years ago
OSU-NLP-Group / AgentAttack
View on GitHub
☆22Oct 25, 2024Updated last year
STAIR-BUPT / STAIR-LLMGuardrails
View on GitHub
☆12Sep 29, 2024Updated last year
UCSB-NLP-Chang / SemanticSmooth
View on GitHub
Implementation of paper 'Defending Large Language Models against Jailbreak Attacks via Semantic Smoothing'
☆24Jun 9, 2024Updated 2 years ago
ethz-spylab / superhuman-ai-consistency
View on GitHub
☆30Jun 19, 2023Updated 3 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
LLM-Tuning-Safety / LLMs-Finetuning-Safety
View on GitHub
We jailbreak GPT-3.5 Turbo’s safety guardrails by fine-tuning it on only 10 adversarially designed examples, at a cost of less than $0.20…
☆358Feb 23, 2024Updated 2 years ago
OSU-NLP-Group / AmpleGCG
View on GitHub
AmpleGCG: Learning a Universal and Transferable Generator of Adversarial Attacks on Both Open and Closed LLM
☆87Nov 3, 2024Updated last year
CHATS-lab / persuasive_jailbreaker
View on GitHub
Persuasive Jailbreaker: we can persuade LLMs to jailbreak them!
☆363Oct 17, 2025Updated 9 months ago
chujiezheng / LLM-Safeguard
View on GitHub
Official repository for ICML 2024 paper "On Prompt-Driven Safeguarding for Large Language Models"
☆108May 20, 2025Updated last year
Vinsonzyh / BlueSuffix
View on GitHub
[ICLR 2025] BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacks
☆31Nov 2, 2025Updated 8 months ago
hongyanz / multibranch
View on GitHub
Codes for the paper "Deep Neural Networks with Multi-Branch Architectures Are Less Non-Convex"
☆21Jul 25, 2020Updated 5 years ago
Yu-Fangxu / COLD-Attack
View on GitHub
[ICML 2024] COLD-Attack: Jailbreaking LLMs with Stealthiness and Controllability
☆176Dec 18, 2024Updated last year
thu-coai / SafeUnlearning
View on GitHub
Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks
☆32Jul 9, 2024Updated 2 years ago
XuandongZhao / weak-to-strong
View on GitHub
[ICML 2025] Weak-to-Strong Jailbreaking on Large Language Models
☆90May 2, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
verazuo / prompt-stealing-attack
View on GitHub
[USENIX'24] Prompt Stealing Attacks Against Text-to-Image Generation Models
☆53Jan 11, 2025Updated last year
microsoft / MageBench
View on GitHub
Official Repo for MageBench: Bridging Large Multimodal Models to Agents
☆22Jan 8, 2025Updated last year
shizhouxing / LLM-Detector-Robustness
View on GitHub
[TACL] Code for "Red Teaming Language Model Detectors with Language Models"
☆24Nov 24, 2023Updated 2 years ago
Improbable-AI / curiosity_redteam
View on GitHub
Official implementation of ICLR'24 paper, "Curiosity-driven Red Teaming for Large Language Models" (https://openreview.net/pdf?id=4KqkizX…
☆90Mar 15, 2024Updated 2 years ago
aiPenguin / StopReasoning
View on GitHub
☆15Oct 6, 2024Updated last year
ejones313 / auditing-llms
View on GitHub
☆61Mar 9, 2023Updated 3 years ago
LLMSecurity / MasterKey
View on GitHub
MASTERKEY is a framework designed to explore and exploit vulnerabilities in large language model chatbots by automating jailbreak attacks…
☆38Sep 12, 2024Updated last year
sherdencooper / GPTFuzz
View on GitHub
Official repo for GPTFUZZER : Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts
☆601Feb 27, 2026Updated 4 months ago
RUCAIBox / HADES
View on GitHub
[ECCV'24 Oral] The official GitHub page for ''Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking …
☆39Oct 23, 2024Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
multimodal-art-projection / I-SHEEP
View on GitHub
I-SHEEP: Iterative Self-enHancEmEnt Paradigm of LLMs through Self-Instruct and Self-Assessment
☆17Jan 16, 2025Updated last year
centerforaisafety / HarmBench
View on GitHub
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
☆1,011Aug 16, 2024Updated last year
yifeiwang77 / Self-Correction
View on GitHub
☆20Nov 3, 2024Updated last year
fbarez / neuroplasticity
View on GitHub
☆14Mar 31, 2024Updated 2 years ago
sokcertifiedrobustness / VeriGauge-deprecated
View on GitHub
☆11Oct 18, 2022Updated 3 years ago
YitingQu / unsafe-diffusion
View on GitHub
☆50Jul 14, 2024Updated 2 years ago
RapidResponseBench / rapidresponsebench
View on GitHub
☆35Nov 12, 2024Updated last year