ethz-spylab/unlearning-vs-safety

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ethz-spylab/unlearning-vs-safety)

ethz-spylab / unlearning-vs-safety

☆27

Alternatives and similar repositories for unlearning-vs-safety

Users that are interested in unlearning-vs-safety are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

OPTML-Group / Unlearn-Simple
View on GitHub
[NeurIPS25] Official repo for "Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning"
☆45Oct 3, 2025Updated 9 months ago
OPTML-Group / SOUL
View on GitHub
Official repo for EMNLP'24 paper "SOUL: Unlocking the Power of Second-Order Optimization for LLM Unlearning"
☆30Oct 1, 2024Updated last year
Lslland / T-Vaccine
View on GitHub
☆19Jun 21, 2025Updated last year
boyiwei / CoTaEval
View on GitHub
[NeurIPS 2024 D&B] Evaluating Copyright Takedown Methods for Language Models
☆17Jul 17, 2024Updated 2 years ago
OPTML-Group / WAGLE
View on GitHub
Official repo for NeurIPS'24 paper "WAGLE: Strategic Weight Attribution for Effective and Modular Unlearning in Large Language Models"
☆19Dec 16, 2024Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
zhliu0106 / learning-to-refuse
View on GitHub
Official Implementation of "Learning to Refuse: Towards Mitigating Privacy Risks in LLMs"
☆10Dec 13, 2024Updated last year
princeton-nlp / benign-data-breaks-safety
View on GitHub
☆47Oct 1, 2024Updated last year
jaechan-repo / muse_bench
View on GitHub
☆33Aug 9, 2024Updated last year
sail-sg / closer-look-LLM-unlearning
View on GitHub
[ICLR 2025] A Closer Look at Machine Unlearning for Large Language Models
☆49Dec 4, 2024Updated last year
centerforaisafety / wmdp
View on GitHub
WMDP is a LLM proxy benchmark for hazardous knowledge in bio, cyber, and chemical security. We also release code for RMU, an unlearning m…
☆176May 29, 2025Updated last year
EnnengYang / Efficient-WEMoE
View on GitHub
Efficient and Effective Weight-Ensembling Mixture of Experts for Multi-Task Model Merging. Arxiv, 2024.
☆16Oct 28, 2024Updated last year
franciscoliu / Awesome-GenAI-Unlearning
View on GitHub
☆188Apr 22, 2026Updated 3 months ago
git-disl / Vaccine
View on GitHub
This is the official code for the paper "Vaccine: Perturbation-aware Alignment for Large Language Models" (NeurIPS2024)
☆51Jan 15, 2026Updated 6 months ago
arobey1 / advbench
View on GitHub
☆45Mar 3, 2023Updated 3 years ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
houseme / sensitive-rs
View on GitHub
Sensitive-rs is a Rust library for finding, validating, filtering, and replacing sensitive words. It provides efficient algorithms to han…
☆26Jul 22, 2026Updated last week
NJUPT-SAST / aurora-ui
View on GitHub
🌏 UI component library for the future, based on WebComponent.
☆23Nov 12, 2024Updated last year
licong-lin / negative-preference-optimization
View on GitHub
☆76Jul 15, 2024Updated 2 years ago
swj0419 / muse_bench
View on GitHub
☆34Mar 13, 2025Updated last year
rucnyz / LeakAgent
View on GitHub
☆29Aug 31, 2025Updated 10 months ago
UCSB-NLP-Chang / causal_unlearn
View on GitHub
[EMNLP 2024] "Revisiting Who's Harry Potter: Towards Targeted Unlearning from a Causal Intervention Perspective"
☆35Jul 22, 2024Updated 2 years ago
chrisliu298 / llm-unlearn-eco
View on GitHub
[NeurIPS 2024] Large Language Model Unlearning via Embedding-Corrupted Prompts
☆41Sep 26, 2024Updated last year
kevinyaobytedance / llm_unlearn
View on GitHub
LLM Unlearning
☆185Oct 20, 2023Updated 2 years ago
AIM-Intelligence / RepBend
View on GitHub
Code for Representation Bending Paper
☆16Jul 15, 2025Updated last year
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
rishub-tamirisa / tamper-resistance
View on GitHub
[ICLR 2025] Official Repository for "Tamper-Resistant Safeguards for Open-Weight LLMs"
☆68Jun 9, 2025Updated last year
princeton-polaris-lab / Evaluating-Durable-Safeguards
View on GitHub
[ICLR 2025] On Evluating the Durability of Safegurads for Open-Weight LLMs
☆13Jun 20, 2025Updated last year
qinyiwei / InfoBench
View on GitHub
☆61Aug 22, 2024Updated last year
GraySwanAI / circuit-breakers
View on GitHub
Improving Alignment and Robustness with Circuit Breakers
☆266Sep 24, 2024Updated last year
jinzhuoran / RWKU
View on GitHub
RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models. NeurIPS 2024
☆100Sep 30, 2024Updated last year
scaleapi / mrt
View on GitHub
https://scale.com/research/mrt
☆20Mar 16, 2026Updated 4 months ago
SALT-NLP / Efficient_Unlearning
View on GitHub
☆38Oct 18, 2023Updated 2 years ago
1andrevich / antifilter-domain
View on GitHub
Generated geosite.dat based on Antifilter Community List
☆29Updated this week
neelnanda-io / Neuroscope
View on GitHub
Accompanying codebase for neuroscope.io, a website for displaying max activating dataset examples for language model neurons
☆15Feb 13, 2023Updated 3 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
annahdo / implementing_activation_steering
View on GitHub
A collection of different ways to implement accessing and modifying internal model activations for LLMs
☆24Oct 18, 2024Updated last year
graldij / transformer-fusion
View on GitHub
Official repository of the "Transformer Fusion with Optimal Transport" paper, published as a conference paper at ICLR 2024.
☆31Apr 19, 2024Updated 2 years ago
phax / en16931-cii2ubl
View on GitHub
Converter for EN16931 invoices from CII to UBL
☆45Updated this week
LiuAmber / RAHF
View on GitHub
[ACL 2024 main] Aligning Large Language Models with Human Preferences through Representation Engineering (https://aclanthology.org/2024.…
☆28Sep 25, 2024Updated last year
WanliYoung / Revisit-Editing-Evaluation
View on GitHub
Code and data repository for "The Mirage of Model Editing: Revisiting Evaluation in the Wild"
☆18Aug 27, 2025Updated 11 months ago
NLie2 / what_features_jailbreak_LLMs
View on GitHub
☆18Mar 30, 2025Updated last year
Babelscape / ALERT
View on GitHub
Official repository for the paper "ALERT: A Comprehensive Benchmark for Assessing Large Language Models’ Safety through Red Teaming"
☆60Sep 20, 2024Updated last year