Sadcardation/MLLM-Refusal

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Sadcardation/MLLM-Refusal)

Sadcardation / MLLM-Refusal

Repository for the Paper: Refusing Safe Prompts for Multi-modal Large Language Models

☆18

Alternatives and similar repositories for MLLM-Refusal

Users that are interested in MLLM-Refusal are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

itsvaibhav01 / Immune
View on GitHub
[CVPR2025] Official Repository for IMMUNE: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment
☆27Jun 11, 2025Updated 9 months ago
inspire-group / tta_risk
View on GitHub
☆14Jun 6, 2023Updated 2 years ago
meet-cjli / CTRL
View on GitHub
An Embarrassingly Simple Backdoor Attack on Self-supervised Learning
☆20Jan 24, 2024Updated 2 years ago
kaiwenzha / contrastive-poisoning
View on GitHub
[ICLR 2023, Spotlight] Indiscriminate Poisoning Attacks on Unsupervised Contrastive Learning
☆31Dec 2, 2023Updated 2 years ago
jiah-li / magic
View on GitHub
The repo for paper: Exploiting the Index Gradients for Optimization-Based Jailbreaking on Large Language Models.
☆14Dec 16, 2024Updated last year
yasamin-med / P2P
View on GitHub
☆21Mar 18, 2026Updated last week
wbopan / safety-residual-space
View on GitHub
☆21Mar 20, 2025Updated last year
RylanSchaeffer / AstraFellowship-When-Do-VLM-Image-Jailbreaks-Transfer
View on GitHub
Code for ICLR 2025 Failures to Find Transferable Image Jailbreaks Between Vision-Language Models
☆36Jun 1, 2025Updated 9 months ago
MKYucel / hybrid_augment
View on GitHub
[ICCV 2023] HybridAugment++: Unified Frequency Spectra Perturbations for Model Robustness
☆17Sep 28, 2023Updated 2 years ago
facebookresearch / radioactive-watermark
View on GitHub
Code for the paper "Watermarking Makes Language Models Radioactive"
☆21Oct 25, 2024Updated last year
qizhangli / MoreBayesian-attack
View on GitHub
Code for our ICLR 2023 paper Making Substitute Models More Bayesian Can Enhance Transferability of Adversarial Examples.
☆18May 31, 2023Updated 2 years ago
isXinLiu / Awesome-MLLM-Safety
View on GitHub
Accepted by IJCAI-24 Survey Track
☆229Aug 25, 2024Updated last year
Gwinhen / DRUPE
View on GitHub
Distribution Preserving Backdoor Attack in Self-supervised Learning
☆20Jan 27, 2024Updated 2 years ago
NY1024 / BAP-Jailbreak-Vision-Language-Models-via-Bi-Modal-Adversarial-Prompt
View on GitHub
☆60Jun 5, 2024Updated last year
HyeonjeongHa / MM-PoisonRAG
View on GitHub
Official PyTorch implementation of "MM-PoisonRAG: Disrupting Multimodal RAG with Local and Global Poisoning Attacks"
☆12Dec 4, 2025Updated 3 months ago
SaFo-Lab / DoxBench
View on GitHub
[ICLR 2026] The official code for "Doxing via the Lens: Revealing Location-related Privacy Leakage on Multi-modal Large Reasoning Models"
☆25Feb 7, 2026Updated last month
Tele-EVOL / TeleAI-Safety
View on GitHub
☆23Jan 5, 2026Updated 2 months ago
jiaxiaojunQAQ / FP-Better
View on GitHub
Code for Fast Propagation is Better: Accelerating Single-Step Adversarial Training via Sampling Subnetworks (TIFS2024)
☆13Mar 29, 2024Updated last year
Cinofix / sponge_poisoning_energy_latency_attack
View on GitHub
Source code for the Energy-Latency Attacks via Sponge Poisoning paper.
☆15Mar 14, 2022Updated 4 years ago
llm-editing / editing-attack
View on GitHub
Code and dataset for the paper: "Can Editing LLMs Inject Harm?"
☆21Dec 26, 2025Updated 2 months ago
kubaneagle2000 / hello-world
View on GitHub
☆10Aug 19, 2024Updated last year
aidatatools / LLM_Sentinel
View on GitHub
A project (LLM Sentinel) that showcases NVIDIA's NeMo-Guardrails and LangChain for improving LLM safety
☆12Jan 22, 2025Updated last year
AI4Good24 / PsySafe
View on GitHub
☆52Feb 8, 2025Updated last year
thinwayliu / Multimodal-Unlearnable-Examples
View on GitHub
The code for ACM MM2024 (Multimodal Unlearnable Examples: Protecting Data against Multimodal Contrastive Learning)
☆15Jul 18, 2024Updated last year
chen37058 / Red-Team-Arxiv-Paper-Update
View on GitHub
Awesome Jailbreak, red teaming arxiv papers (Automatically Update Every 12th hours)
☆100Updated this week
edadaltocg / detectors
View on GitHub
Python package to accelerate research on generalized out-of-distribution (OOD) detection.
☆15Jun 19, 2024Updated last year
huanranchen / VLMTransfer
View on GitHub
A package that achieves 95%+ transfer attack success rate against GPT-4
☆26Oct 24, 2024Updated last year
ybwang119 / label_recovery
View on GitHub
[ICLR 2024] Towards Elminating Hard Label Constraints in Gradient Inverision Attacks
☆14Feb 6, 2024Updated 2 years ago
LaVi-Lab / Rethink_CoT_Video
View on GitHub
Official code for "Rethinking Chain-of-Thought Reasoning for Videos"
☆20Dec 14, 2025Updated 3 months ago
k1rezaei / Text-to-concept
View on GitHub
☆35Feb 5, 2024Updated 2 years ago
Allen-piexl / JailbreakZoo
View on GitHub
☆165Sep 2, 2024Updated last year
liudaizong / Awesome-LVLM-Attack
View on GitHub
😎 up-to-date & curated list of awesome Attacks on Large-Vision-Language-Models papers, methods & resources.
☆521Updated this week
Hongyang-Du / VideoGPA
View on GitHub
VideoGPA is a self-supervised framework that enhances 3D consistency in Video Diffusion Models.
☆42Mar 16, 2026Updated last week
INTREBID / Awesome-MM-RAG
View on GitHub
This repository is for our survey paper: "A Comprehensive Survey on Multimodal RAG: All Combinations of Modalities as Input and Output"
☆46Nov 21, 2025Updated 4 months ago
ys-zong / VLGuard
View on GitHub
[ICML 2024] Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models.
☆86Jan 19, 2025Updated last year
lafeat / apbench
View on GitHub
APBench: A Unified Availability Poisoning Attack and Defenses Benchmark (TMLR 08/2024)
☆46Apr 15, 2025Updated 11 months ago
showlab / FocusUI
View on GitHub
[CVPR 2026] FocusUI: Efficient UI Grounding via Position-Preserving Visual Token Selection
☆25Feb 10, 2026Updated last month
CLUEbenchmark / Math24o
View on GitHub
Math24o: 高中奥林匹克数学竞赛测评集 High School Olympiad Mathematics Chinese Benchmark
☆11Mar 27, 2025Updated 11 months ago
gq-max / AdvDiffVLM
View on GitHub
☆48Apr 7, 2025Updated 11 months ago