ThuCCSLab / misalignmentLinks

[NDSS'25] The official implementation of safety misalignment.

☆16

Alternatives and similar repositories for misalignment

Users that are interested in misalignment are comparing it to the libraries listed below

Sorting:

Lyz1213 / BadEdit
☆35Updated 11 months ago
qingjiesjtu / USC
This is the code repository of our submission: Understanding the Dark Side of LLMs’ Intrinsic Self-Correction.
☆63Updated 9 months ago
wanlunsec / Beatrix
☆25Updated 2 years ago
WUSTL-CSPL / LLMJailbreak
☆36Updated last year
Gwinhen / DRUPE
Distribution Preserving Backdoor Attack in Self-supervised Learning
☆17Updated last year
tmllab / 2025_ICLR_PiF
☆35Updated 4 months ago
NY1024 / BAP-Jailbreak-Vision-Language-Models-via-Bi-Modal-Adversarial-Prompt
☆52Updated last year
Eyr3 / TextCRS
Text-CRS: A Generalized Certified Robustness Framework against Textual Adversarial Attacks (IEEE S&P 2024)
☆34Updated 3 months ago
PKU-ML / PAT
Code for NeurIPS 2024 Paper "Fight Back Against Jailbreaking via Prompt Adversarial Tuning"
☆19Updated 5 months ago
AI-secure / MMDT
Comprehensive Assessment of Trustworthiness in Multimodal Foundation Models
☆22Updated 6 months ago
inspire-group / RobustRAG
☆20Updated last year
umd-huang-lab / VLM-Poisoning
Code for Neurips 2024 paper "Shadowcast: Stealthy Data Poisoning Attacks Against Vision-Language Models"
☆55Updated 8 months ago
SolidShen / RIPPLE_official
☆20Updated last year
wagner-group / MarkMyWords
☆30Updated last year
RUCAIBox / HADES
[ECCV'24 Oral] The official GitHub page for ''Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking …
☆29Updated 11 months ago
reds-lab / Meta-Sift
The official implementation of USENIX Security'23 paper "Meta-Sift" -- Ten minutes or less to find a 1000-size or larger clean subset on …
☆19Updated 2 years ago
RU-System-Software-and-Security / FeatureRE
☆27Updated 2 years ago
rain152 / PAT
[NeurIPS 2024] Fight Back Against Jailbreaking via Prompt Adversarial Tuning
☆10Updated 11 months ago
KuofengGao / Verbose_Images
[ICLR 2024] Inducing High Energy-Latency of Large Vision-Language Models with Verbose Images
☆41Updated last year
PurduePAML / DBS
☆18Updated 3 years ago
yxoh / prompt_leak_usenix2024
☆13Updated last year
TeamPigeonLab / CS-DJ
Accept by CVPR 2025 (highlight)
☆18Updated 4 months ago
lancopku / agent-backdoor-attacks
Code&Data for the paper "Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents" [NeurIPS 2024]
☆91Updated last year
Django-Jiang / BadChain
[ICLR24] Official Repo of BadChain: Backdoor Chain-of-Thought Prompting for Large Language Models
☆38Updated last year
ybwang119 / label_recovery
[ICLR 2024] Towards Elminating Hard Label Constraints in Gradient Inverision Attacks
☆13Updated last year
TrustAIRLab / VoiceJailbreakAttack
Code for Voice Jailbreak Attacks Against GPT-4o.
☆34Updated last year
bboylyg / RNP
Reconstructive Neuron Pruning for Backdoor Defense (ICML 2023)
☆39Updated last year
tianshuocong / TePA
[S&P'24] Test-Time Poisoning Attacks Against Test-Time Adaptation Models
☆19Updated 7 months ago
Sizhe-Chen / StruQ
official implementation of [USENIX Sec'25] StruQ: Defending Against Prompt Injection with Structured Queries
☆45Updated 2 months ago
verazuo / prompt-stealing-attack
[USENIX'24] Prompt Stealing Attacks Against Text-to-Image Generation Models
☆45Updated 9 months ago