IBM/SafeLoRA

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/IBM/SafeLoRA)

IBM / SafeLoRA

Github repo for NeurIPS 2024 paper "Safe LoRA: the Silver Lining of Reducing Safety Risks when Fine-tuning Large Language Models"

☆29

Alternatives and similar repositories for SafeLoRA

Users that are interested in SafeLoRA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

homles11 / SaLoRA
View on GitHub
Code for “SaLoRA: Safety-Alignment Preserved Low-Rank Adaptation(ICLR 2025)”
☆29Oct 23, 2025Updated 8 months ago
aladinD / SafeMERGE
View on GitHub
Code for SafeMERGE (ICLR 2025).
☆15Apr 1, 2025Updated last year
LLLeoLi / LARF
View on GitHub
[EMNLP 2025] Layer-Aware Representation Filtering: Purifying Finetuning Data to Preserve LLM Safety Alignment
☆15Jul 22, 2025Updated 11 months ago
git-disl / Vaccine
View on GitHub
This is the official code for the paper "Vaccine: Perturbation-aware Alignment for Large Language Models" (NeurIPS2024)
☆51Jan 15, 2026Updated 6 months ago
PKU-YuanGroup / AsFT
View on GitHub
Code for the paper "AsFT: Anchoring Safety During LLM Fune-Tuning Within Narrow Safety Basin".
☆37Jul 10, 2025Updated last year
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
Lslland / T-Vaccine
View on GitHub
☆19Jun 21, 2025Updated last year
git-disl / Lisa
View on GitHub
This is the official code for the paper "Lazy Safety Alignment for Large Language Models against Harmful Fine-tuning" (NeurIPS2024)
☆28Sep 10, 2024Updated last year
poloclub / llm-landscape
View on GitHub
NeurIPS'24 - LLM Safety Landscape
☆40Oct 21, 2025Updated 8 months ago
git-disl / awesome_LLM-harmful-fine-tuning-papers
View on GitHub
A survey on harmful fine-tuning attack for large language model (ACM CSUR)
☆247Jun 22, 2026Updated 3 weeks ago
fangjf1 / OpenSafeMLRM
View on GitHub
The first toolkit for MLRM safety evaluation, providing unified interface for mainstream models, datasets, and jailbreaking methods!
☆15Apr 8, 2025Updated last year
Jihuai-wpy / InferAligner
View on GitHub
Inference-time alignment for harmlessness through cross-model guidance (ACL 2024). Code + MM-Harmful Bench.
☆38Oct 2, 2024Updated last year
CHATS-lab / LLMs_Encode_Harmfulness_Refusal_Separately
View on GitHub
☆41Jul 3, 2026Updated 2 weeks ago
6zHAOyi / BadVision
View on GitHub
This is an official code repository for CVPR 2025 paper BadVision.
☆15Nov 18, 2025Updated 8 months ago
DYR1 / MoGU
View on GitHub
Our research proposes a novel MoGU framework that improves LLMs' safety while preserving their usability.
☆18Jan 14, 2025Updated last year
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
listen0425 / Safety-Layers
View on GitHub
code space of paper "Safety Layers in Aligned Large Language Models: The Key to LLM Security" (ICLR 2025)
☆25Apr 26, 2025Updated last year
hanshen95 / SEAL
View on GitHub
An implementation of SEAL: Safety-Enhanced Aligned LLM fine-tuning via bilevel data selection.
☆24Feb 20, 2025Updated last year
chanind / claude-auto-research-synthsaebench
View on GitHub
☆23Mar 11, 2026Updated 4 months ago
CERT-Lab / abba
View on GitHub
(ICLR '26) ABBA-Adapters: Efficient and Expressive Fine-Tuning of Foundation Models
☆22Sep 25, 2025Updated 9 months ago
vfleaking / PTST
View on GitHub
Code for safety test in "Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates"
☆22Sep 21, 2025Updated 9 months ago
Zhaoxian-Wu / IOS
View on GitHub
Code for paper "Byzantine-Resilient Decentralized Stochastic Optimization with Robust Aggregation Rules"
☆20Apr 19, 2024Updated 2 years ago
CryptoAILab / misalignment
View on GitHub
[NDSS'25] The official implementation of safety misalignment.
☆19Jan 8, 2025Updated last year
baixianghuang / editing-attack
View on GitHub
Code and dataset for the paper: "Can Editing LLMs Inject Harm?" [AAAI'26]
☆21Dec 26, 2025Updated 6 months ago
arnab-api / romba
View on GitHub
Applies ROME and MEMIT on Mamba-S4 models
☆16Apr 5, 2024Updated 2 years ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
SproutNan / AI-Safety_SCAV
View on GitHub
This is the code repository for "Uncovering Safety Risks of Large Language Models through Concept Activation Vector"
☆49Oct 13, 2025Updated 9 months ago
HydroXai / Enhancing-Safety-in-Large-Language-Models
View on GitHub
Precision Knowledge Editing (PKE): A novel method to reduce toxicity in LLMs while preserving performance, with robust evaluations and ha…
☆11Nov 26, 2024Updated last year
chicosirius / think-or-not
View on GitHub
☆22May 23, 2025Updated last year
itsvaibhav01 / Immune
View on GitHub
[CVPR2025] Official Repository for IMMUNE: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment
☆28Jun 11, 2025Updated last year
Zeyi-Lin / easy-r1
View on GitHub
Train deepseek r1-like reasoning LLM with ease | 轻松训练1个deepseek r1类的推理LLM
☆20Feb 15, 2025Updated last year
rezashkv / diffusion_pruning
View on GitHub
[ICLR 2025] Adaptive prompt tailored pruning of T2I diffusion models.
☆15Feb 1, 2025Updated last year
Unispac / shallow-vs-deep-alignment
View on GitHub
Official Repository for The Paper: Safety Alignment Should Be Made More Than Just a Few Tokens Deep
☆186Apr 23, 2025Updated last year
thu-ml / STAIR
View on GitHub
Official codebase for "STAIR: Improving Safety Alignment with Introspective Reasoning"
☆89Feb 26, 2025Updated last year
ColinLu50 / SafeDelta
View on GitHub
The official code repo for "Safe Delta: Consistently Preserving Safety when Fine-Tuning LLMs on Diverse Datasets" in ICML 2025.
☆59Feb 12, 2026Updated 5 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
epfml / getting-started
View on GitHub
☆34Mar 18, 2026Updated 4 months ago
hqsiswiliam / hira
View on GitHub
The repo for HiRA paper
☆38Jan 9, 2026Updated 6 months ago
thefcraft / prompt-generator-stable-diffusion
View on GitHub
Prompt Generator model for Stable Diffusion Models
☆12Jun 20, 2023Updated 3 years ago
chiayi-hsu / Ring-A-Bell
View on GitHub
☆45Jan 15, 2025Updated last year
declare-lab / resta
View on GitHub
Restore safety in fine-tuned language models through task arithmetic
☆33Mar 28, 2024Updated 2 years ago
tdemin16 / multi-lane
View on GitHub
Official Implementation of MULTI-LANE (Multi Label class incremental learning via summarising pAtch tokeN Embeddings). Published in 3rd C…
☆15Feb 20, 2025Updated last year
NY1024 / BAP-Jailbreak-Vision-Language-Models-via-Bi-Modal-Adversarial-Prompt
View on GitHub
☆61Jun 5, 2024Updated 2 years ago