CryptoAILab / MergeGuardLinks

[CCS-LAMPS'24] LLM IP Protection Against Model Merging

☆16

Alternatives and similar repositories for MergeGuard

Users that are interested in MergeGuard are comparing it to the libraries listed below

Sorting:

papersPapers / BadPrompt
Code for the paper "BadPrompt: Backdoor Attacks on Continuous Prompts"
☆40Updated last year
Alibaba-AAIG / Oyster
The Oyster series is a set of safety models developed in-house by Alibaba-AAIG, devoted to building a responsible AI ecosystem. | Oyster …
☆57Updated 4 months ago
ethz-spylab / rlhf-poisoning
Code for paper "Universal Jailbreak Backdoors from Poisoned Human Feedback"
☆66Updated last year
yxoh / prompt_leak_usenix2024
☆13Updated last year
clearloveclearlove / BEAT
☆14Updated 10 months ago
reds-lab / Meta-Sift
The official implementation of USENIX Security'23 paper "Meta-Sift" -- Ten minutes or less to find a 1000-size or larger clean subset on …
☆20Updated 2 years ago
Jayfeather1024 / Backdoor-Enhanced-Alignment
☆24Updated last year
shiningrain / JailGuard
☆24Updated 9 months ago
AISafety-HKUST / Backdoor_Safety_Tuning
Backdoor Safety Tuning (NeurIPS 2023 & 2024 Spotlight)
☆27Updated last year
grasses / PoisonPrompt
Code for paper: PoisonPrompt: Backdoor Attack on Prompt-based Large Language Models, IEEE ICASSP 2024. Demo//124.220.228.133:11107
☆19Updated last year
qingjiesjtu / USC
This is the code repository of our submission: Understanding the Dark Side of LLMs’ Intrinsic Self-Correction.
☆63Updated last year
byerose / Awesome-Foundation-Model-Security
A curated list of trustworthy Generative AI papers. Daily updating...
☆75Updated last year
umd-huang-lab / VLM-Poisoning
Code for Neurips 2024 paper "Shadowcast: Stealthy Data Poisoning Attacks Against Vision-Language Models"
☆58Updated 11 months ago
cnut1648 / Model-Fingerprint
Fingerprint large language models
☆47Updated last year
bboylyg / RNP
Reconstructive Neuron Pruning for Backdoor Defense (ICML 2023)
☆39Updated 2 years ago
rain152 / PAT
[NeurIPS 2024] Fight Back Against Jailbreaking via Prompt Adversarial Tuning
☆10Updated last year
ZrW00 / MuScleLoRA
The code implementation of MuScleLoRA (Accepted in ACL 2024)
☆10Updated last year
THU-BPM / unforgeable_watermark
Source code of paper "An Unforgeable Publicly Verifiable Watermark for Large Language Models" accepted by ICLR 2024
☆34Updated last year
inspire-group / RobustRAG
☆21Updated last year
AI45Lab / CodeAttack
[ACL 2024] CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion
☆58Updated 3 months ago
AI-secure / AdvAgent
☆20Updated 7 months ago
THU-BPM / Robust_Watermark
Code and data for paper "A Semantic Invariant Robust Watermark for Large Language Models" accepted by ICLR 2024.
☆37Updated last year
MartinPawelczyk / In-Context-Unlearning
"In-Context Unlearning: Language Models as Few Shot Unlearners". Martin Pawelczyk, Seth Neel* and Himabindu Lakkaraju*; ICML 2024.
☆28Updated 2 years ago
thu-ml / STAIR
Official codebase for "STAIR: Improving Safety Alignment with Introspective Reasoning"
☆86Updated 10 months ago
David-Li0406 / AI-Supervision-Risk
☆21Updated 9 months ago
sail-sg / AnyDoor
AnyDoor: Test-Time Backdoor Attacks on Multimodal Large Language Models
☆60Updated last year
sophie-xhonneux / Continuous-AdvTrain
☆34Updated 4 months ago
NY1024 / BAP-Jailbreak-Vision-Language-Models-via-Bi-Modal-Adversarial-Prompt
☆55Updated last year
lancopku / agent-backdoor-attacks
Code&Data for the paper "Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents" [NeurIPS 2024]
☆105Updated last year
zqypku / mm_poison
☆21Updated 2 years ago