DAMO-NLP-SG/multilingual-safety-for-LLMs

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/DAMO-NLP-SG/multilingual-safety-for-LLMs)

DAMO-NLP-SG / multilingual-safety-for-LLMs

[ICLR 2024]Data for "Multilingual Jailbreak Challenges in Large Language Models"

☆106

Alternatives and similar repositories for multilingual-safety-for-LLMs

Users that are interested in multilingual-safety-for-LLMs are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

zhliu0106 / learning-to-refuse
View on GitHub
Official Implementation of "Learning to Refuse: Towards Mitigating Privacy Risks in LLMs"
☆10Dec 13, 2024Updated last year
Lslland / T-Vaccine
View on GitHub
☆19Jun 21, 2025Updated last year
LLM-Tuning-Safety / LLMs-Finetuning-Safety
View on GitHub
We jailbreak GPT-3.5 Turbo’s safety guardrails by fine-tuning it on only 10 adversarially designed examples, at a cost of less than $0.20…
☆358Feb 23, 2024Updated 2 years ago
DAMO-NLP-SG / IE-E2H
View on GitHub
Easy-to-Hard Learning for Information Extraction (ACL 2023 Findings)
☆14Jul 11, 2023Updated 3 years ago
paul-rottger / xstest
View on GitHub
Röttger et al. (NAACL 2024): "XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models"
☆138Feb 24, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
NJUNLP / ReNeLLM
View on GitHub
The official implementation of our NAACL 2024 paper "A Wolf in Sheep’s Clothing: Generalized Nested Jailbreak Prompts can Fool Large Lang…
☆163Sep 2, 2025Updated 10 months ago
vinid / safety-tuned-llamas
View on GitHub
ICLR2024 Paper. Showing properties of safety tuning and exaggerated safety.
☆95May 9, 2024Updated 2 years ago
huizhang-L / CodeChameleon
View on GitHub
☆30Mar 20, 2024Updated 2 years ago
DAMO-NLP-SG / SeaLLMs-Audio
View on GitHub
☆53Dec 7, 2025Updated 7 months ago
ethz-spylab / unlearning-vs-safety
View on GitHub
☆27Oct 6, 2024Updated last year
princeton-nlp / benign-data-breaks-safety
View on GitHub
☆47Oct 1, 2024Updated last year
QwenLM / online_merging_optimizers
View on GitHub
Implementations of online merging optimizers proposed by Online Merging Optimizers for Boosting Rewards and Mitigating Tax in Alignment
☆82Jun 19, 2024Updated 2 years ago
AI45Lab / ActorAttack
View on GitHub
☆135Jun 29, 2026Updated 3 weeks ago
Princeton-SysML / Jailbreak_LLM
View on GitHub
☆203Nov 26, 2023Updated 2 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
EnnengYang / Efficient-WEMoE
View on GitHub
Efficient and Effective Weight-Ensembling Mixture of Experts for Multi-Task Model Merging. Arxiv, 2024.
☆16Oct 28, 2024Updated last year
patrickrchao / JailbreakingLLMs
View on GitHub
☆756Jul 2, 2025Updated last year
HelloEveryboby / Butler
View on GitHub
Butler 是一个用于自动化服务管理和任务调度的工具项目。
☆17Updated this week
chujiezheng / LLM-Safeguard
View on GitHub
Official repository for ICML 2024 paper "On Prompt-Driven Safeguarding for Large Language Models"
☆108May 20, 2025Updated last year
RUCAIBox / HADES
View on GitHub
[ECCV'24 Oral] The official GitHub page for ''Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking …
☆39Oct 23, 2024Updated last year
qizhangli / Gradient-based-Jailbreak-Attacks
View on GitHub
Code for our NeurIPS 2024 paper Improved Generation of Adversarial Examples Against Safety-aligned LLMs
☆12Nov 7, 2024Updated last year
Aatrox103 / SAP
View on GitHub
☆49May 9, 2024Updated 2 years ago
alphadl / SafeLLM_with_IntentionAnalysis
View on GitHub
Towards Safe LLM with our simple-yet-highly-effective Intention Analysis Prompting
☆21Mar 25, 2024Updated 2 years ago
uw-nsl / SafeDecoding
View on GitHub
Official Repository for ACL 2024 Paper SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding
☆154Jul 19, 2024Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
git-disl / Vaccine
View on GitHub
This is the official code for the paper "Vaccine: Perturbation-aware Alignment for Large Language Models" (NeurIPS2024)
☆51Jan 15, 2026Updated 6 months ago
Allen-piexl / JailbreakZoo
View on GitHub
☆171Sep 2, 2024Updated last year
DAMO-NLP-SG / contrastive-cot
View on GitHub
Contrastive Chain-of-Thought Prompting
☆69Nov 18, 2023Updated 2 years ago
boyiwei / CoTaEval
View on GitHub
[NeurIPS 2024 D&B] Evaluating Copyright Takedown Methods for Language Models
☆17Jul 17, 2024Updated 2 years ago
Libr-AI / do-not-answer
View on GitHub
Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMs
☆335Jun 7, 2024Updated 2 years ago
sherdencooper / GPTFuzz
View on GitHub
Official repo for GPTFUZZER : Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts
☆601Feb 27, 2026Updated 4 months ago
OPTML-Group / WAGLE
View on GitHub
Official repo for NeurIPS'24 paper "WAGLE: Strategic Weight Attribution for Effective and Modular Unlearning in Large Language Models"
☆19Dec 16, 2024Updated last year
jaechan-repo / muse_bench
View on GitHub
☆33Aug 9, 2024Updated last year
thu-ml / Attack-Bard
View on GitHub
☆108Feb 16, 2024Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
xirui-li / DrAttack
View on GitHub
Official implementation of paper: DrAttack: Prompt Decomposition and Reconstruction Makes Powerful LLM Jailbreakers
☆68Aug 25, 2024Updated last year
houseme / sensitive-rs
View on GitHub
Sensitive-rs is a Rust library for finding, validating, filtering, and replacing sensitive words. It provides efficient algorithms to han…
☆26Updated this week
NJUPT-SAST / aurora-ui
View on GitHub
🌏 UI component library for the future, based on WebComponent.
☆23Nov 12, 2024Updated last year
thefcraft / prompt-generator-stable-diffusion
View on GitHub
Prompt Generator model for Stable Diffusion Models
☆12Jun 20, 2023Updated 3 years ago
DAMO-NLP-SG / Auto-Arena-LLMs
View on GitHub
☆44Oct 7, 2024Updated last year
RobustNLP / CipherChat
View on GitHub
A framework to evaluate the generalization capability of safety alignment for LLMs
☆628Oct 9, 2025Updated 9 months ago
GraySwanAI / circuit-breakers
View on GitHub
Improving Alignment and Robustness with Circuit Breakers
☆266Sep 24, 2024Updated last year