chrisliu298 / awesome-llm-unlearningLinks

A resource repository for machine unlearning in large language models

☆506

Alternatives and similar repositories for awesome-llm-unlearning

Users that are interested in awesome-llm-unlearning are comparing it to the libraries listed below

Sorting:

locuslab / open-unlearning
[NeurIPS D&B '25] The one-stop repository for large language model (LLM) unlearning. Supports TOFU, MUSE, WMDP, and many unlearning metho…
☆423Updated last month
git-disl / awesome_LLM-harmful-fine-tuning-papers
A survey on harmful fine-tuning attack for large language model
☆220Updated this week
hzy312 / Awesome-LLM-Watermark
UP-TO-DATE LLM Watermark paper. 🔥🔥🔥
☆362Updated 11 months ago
kevinyaobytedance / llm_unlearn
LLM Unlearning
☆177Updated 2 years ago
franciscoliu / Awesome-GenAI-Unlearning
☆175Updated this week
chrisliu298 / awesome-representation-engineering
A resource repository for representation engineering in large language models
☆140Updated last year
cooperleong00 / Awesome-LLM-Interpretability
A curated list of LLM Interpretability related material - Tutorial, Library, Survey, Paper, Blog, etc..
☆284Updated 7 months ago
KID-22 / LLM-Unlearning-Paper-List
☆28Updated last year
Unispac / shallow-vs-deep-alignment
Official Repository for The Paper: Safety Alignment Should Be Made More Than Just a Few Tokens Deep
☆163Updated 6 months ago
TrustGen / TrustEval-toolkit
Toolkit for evaluating the trustworthiness of generative foundation models.
☆123Updated 2 months ago
iamgroot42 / mimir
Python package for measuring memorization in LLMs.
☆173Updated 4 months ago
Xianjun-Yang / Awesome_papers_on_LLMs_detection
The lastest paper about detection of LLM-generated text and code
☆280Updated 5 months ago
EnnengYang / Awesome-Model-Merging-Methods-Theories-Applications
Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities. arXiv:2408.07666.
☆599Updated this week
ydyjya / LLM-IHS-Explanation
☆55Updated last year
niconi19 / LLM-Conversation-Safety
[NAACL2024] Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey
☆107Updated last year
ZFancy / awesome-activation-engineering
A curated list of resources for activation engineering
☆111Updated last month
QinbinLi / LLM-PBE
A toolkit to assess data privacy in LLMs (under development)
☆63Updated 10 months ago
isXinLiu / Awesome-MLLM-Safety
Accepted by IJCAI-24 Survey Track
☆223Updated last year
zepingyu0512 / awesome-SAE
awesome SAE papers
☆59Updated 5 months ago
sail-sg / closer-look-LLM-unlearning
[ICLR 2025] A Closer Look at Machine Unlearning for Large Language Models
☆41Updated 11 months ago
swj0419 / muse_bench
☆29Updated 8 months ago
git-disl / Vaccine
This is the official code for the paper "Vaccine: Perturbation-aware Alignment for Large Language Models" (NeurIPS2024)
☆47Updated last year
facebookresearch / advprompter
Official implementation of AdvPrompter https//arxiv.org/abs/2404.16873
☆170Updated last year
isXinLiu / MM-SafetyBench
Accepted by ECCV 2024
☆171Updated last year
licong-lin / negative-preference-optimization
☆68Updated last year
SproutNan / AI-Safety_SCAV
This is the code repository for "Uncovering Safety Risks of Large Language Models through Concept Activation Vector"
☆46Updated last month
ydyjya / SafetyHeadAttribution
☆54Updated 5 months ago
zepingyu0512 / awesome-LLM-neuron
☆32Updated 5 months ago
i-gallegos / Fair-LLM-Benchmark
☆156Updated 2 years ago
LLM-Tuning-Safety / LLMs-Finetuning-Safety
We jailbreak GPT-3.5 Turbo’s safety guardrails by fine-tuning it on only 10 adversarially designed examples, at a cost of less than $0.20…
☆330Updated last year