CryptoAILab / misalignmentLinks

[NDSS'25] The official implementation of safety misalignment.

☆17

Alternatives and similar repositories for misalignment

Users that are interested in misalignment are comparing it to the libraries listed below

Sorting:

NY1024 / BAP-Jailbreak-Vision-Language-Models-via-Bi-Modal-Adversarial-Prompt
☆53Updated last year
Lyz1213 / BadEdit
☆36Updated last year
yxoh / prompt_leak_usenix2024
☆13Updated last year
inspire-group / RobustRAG
☆21Updated last year
TeamPigeonLab / CS-DJ
Accept by CVPR 2025 (highlight)
☆22Updated 5 months ago
tmllab / 2025_ICLR_PiF
☆37Updated 6 months ago
Gwinhen / DRUPE
Distribution Preserving Backdoor Attack in Self-supervised Learning
☆20Updated last year
RUCAIBox / HADES
[ECCV'24 Oral] The official GitHub page for ''Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking …
☆33Updated last year
cnut1648 / Model-Fingerprint
Fingerprint large language models
☆43Updated last year
clearloveclearlove / BEAT
☆14Updated 9 months ago
WUSTL-CSPL / LLMJailbreak
☆37Updated last year
Django-Jiang / BadChain
[ICLR24] Official Repo of BadChain: Backdoor Chain-of-Thought Prompting for Large Language Models
☆43Updated last year
lancopku / codable-watermarking-for-llm
Repository for Towards Codable Watermarking for Large Language Models
☆38Updated 2 years ago
wagner-group / MarkMyWords
☆32Updated last year
KuofengGao / Verbose_Images
[ICLR 2024] Inducing High Energy-Latency of Large Vision-Language Models with Verbose Images
☆40Updated last year
bboylyg / RNP
Reconstructive Neuron Pruning for Backdoor Defense (ICML 2023)
☆39Updated last year
qingjiesjtu / USC
This is the code repository of our submission: Understanding the Dark Side of LLMs’ Intrinsic Self-Correction.
☆63Updated 11 months ago
MaTengSYSU / HIMRD-jailbreak
Code repository for the paper "Heuristic Induced Multimodal Risk Distribution Jailbreak Attack for Multimodal Large Language Models"
☆13Updated 3 months ago
umd-huang-lab / VLM-Poisoning
Code for Neurips 2024 paper "Shadowcast: Stealthy Data Poisoning Attacks Against Vision-Language Models"
☆56Updated 10 months ago
uw-nsl / CleanGen
[EMNLP 24] Official Implementation of CLEANGEN: Mitigating Backdoor Attacks for Generation Tasks in Large Language Models
☆20Updated 8 months ago
PurduePAML / DBS
☆18Updated 3 years ago
PKU-ML / PAT
Code for NeurIPS 2024 Paper "Fight Back Against Jailbreaking via Prompt Adversarial Tuning"
☆21Updated 6 months ago
Eyr3 / TextCRS
Text-CRS: A Generalized Certified Robustness Framework against Textual Adversarial Attacks (IEEE S&P 2024)
☆34Updated 5 months ago
grasses / PoisonPrompt
Code for paper: PoisonPrompt: Backdoor Attack on Prompt-based Large Language Models, IEEE ICASSP 2024. Demo//124.220.228.133:11107
☆18Updated last year
RU-System-Software-and-Security / FeatureRE
☆27Updated 3 years ago
OSU-NLP-Group / EIA_against_webagent
☆36Updated last year
verazuo / prompt-stealing-attack
[USENIX'24] Prompt Stealing Attacks Against Text-to-Image Generation Models
☆47Updated 10 months ago
yfchen1994 / poisoning_membership
☆20Updated last month
lancopku / agent-backdoor-attacks
Code&Data for the paper "Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents" [NeurIPS 2024]
☆99Updated last year
AI-secure / MMDT
Comprehensive Assessment of Trustworthiness in Multimodal Foundation Models
☆24Updated 8 months ago