Lingzhi-WANG / KGAUnlearnLinks

☆17

Alternatives and similar repositories for KGAUnlearn

Users that are interested in KGAUnlearn are comparing it to the libraries listed below

Sorting:

AI45Lab / CodeAttack
[ACL 2024] CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion
☆42Updated 7 months ago
byerose / Awesome-Foundation-Model-Security
A curated list of trustworthy Generative AI papers. Daily updating...
☆73Updated 9 months ago
bboylyg / RNP
Reconstructive Neuron Pruning for Backdoor Defense (ICML 2023)
☆37Updated last year
git-disl / Vaccine
This is the official code for the paper "Vaccine: Perturbation-aware Alignment for Large Language Models" (NeurIPS2024)
☆43Updated 6 months ago
sophie-xhonneux / Continuous-AdvTrain
☆22Updated 9 months ago
MartinPawel / In-Context-Unlearning
"In-Context Unlearning: Language Models as Few Shot Unlearners". Martin Pawelczyk, Seth Neel* and Himabindu Lakkaraju*; ICML 2024.
☆26Updated last year
Lyz1213 / BadEdit
☆28Updated 7 months ago
AISafety-HKUST / Backdoor_Safety_Tuning
Backdoor Safety Tuning (NeurIPS 2023 & 2024 Spotlight)
☆26Updated 6 months ago
shiningrain / JailGuard
☆15Updated 2 months ago
wegodev2 / virtual-prompt-injection
Unofficial implementation of "Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection"
☆18Updated 11 months ago
thu-coai / JailbreakDefense_GoalPriority
[ACL 2024] Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization
☆24Updated 10 months ago
git-disl / Safety-Tax
This is the official code for the paper "Safety Tax: Safety Alignment Makes Your Large Reasoning Models Less Reasonable".
☆16Updated 2 months ago
csdongxian / ANP_backdoor
Codes for NeurIPS 2021 paper "Adversarial Neuron Pruning Purifies Backdoored Deep Models"
☆57Updated 2 years ago
Django-Jiang / BadChain
[ICLR24] Official Repo of BadChain: Backdoor Chain-of-Thought Prompting for Large Language Models
☆35Updated 10 months ago
THU-KEG / WaterBench
[ACL2024-Main] Data and Code for WaterBench: Towards Holistic Evaluation of LLM Watermarks
☆26Updated last year
Unispac / shallow-vs-deep-alignment
Official Repository for The Paper: Safety Alignment Should Be Made More Than Just a Few Tokens Deep
☆126Updated last month
David-Li0406 / AI-Supervision-Risk
☆20Updated 2 months ago
swj0419 / muse_bench
☆21Updated 2 months ago
yaojin17 / Unlearning_LLM
[ACL 2024] Code and data for "Machine Unlearning of Pre-trained Large Language Models"
☆57Updated 8 months ago
KID-22 / LLM-Unlearning-Paper-List
☆28Updated 11 months ago
lapisrocks / rpo
Official repository for "Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks"
☆53Updated 9 months ago
rain152 / PAT
[NeurIPS 2024] Fight Back Against Jailbreaking via Prompt Adversarial Tuning
☆10Updated 7 months ago
reds-lab / CLIP-MIA
This is an official repository for Practical Membership Inference Attacks Against Large-Scale Multi-Modal Models: A Pilot Study (ICCV2023…
☆22Updated last year
papersPapers / BadPrompt
Code for the paper "BadPrompt: Backdoor Attacks on Continuous Prompts"
☆36Updated 11 months ago
chenchenygu / watermark-learnability
☆26Updated 3 months ago
YihanWang617 / llm-jailbreaking-defense
A lightweight library for large laguage model (LLM) jailbreaking defense.
☆51Updated 7 months ago
qingjiesjtu / USC
This is the code repository of our submission: Understanding the Dark Side of LLMs’ Intrinsic Self-Correction.
☆56Updated 5 months ago
ledllm / ledllm
☆19Updated 11 months ago
UCF-ML-Research / TrojLLM
☆25Updated 11 months ago
alewarne / MachineUnlearning
Code related to the paper "Machine Unlearning of Features and Labels"
☆69Updated last year