wangyu-ovo / MMLLinks

Code for the paper "Jailbreak Large Vision-Language Models Through Multi-Modal Linkage"

☆25

Alternatives and similar repositories for MML

Users that are interested in MML are comparing it to the libraries listed below

Sorting:

RUCAIBox / HADES
[ECCV'24 Oral] The official GitHub page for ''Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking …
☆33Updated last year
tmllab / 2025_ICLR_PiF
☆37Updated 7 months ago
NY1024 / BAP-Jailbreak-Vision-Language-Models-via-Bi-Modal-Adversarial-Prompt
☆54Updated last year
thunxxx / MLLM-Jailbreak-evaluation-MMJ-Bench
☆65Updated 8 months ago
erfanshayegani / Jailbreak-In-Pieces
[ICLR 2024 Spotlight 🔥 ] - [ Best Paper Award SoCal NLP 2023 🏆] - Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal…
☆77Updated last year
isXinLiu / MM-SafetyBench
Accepted by ECCV 2024
☆179Updated last year
SaFo-Lab / JailBreakV_28K
[COLM 2024] JailBreakV-28K: A comprehensive benchmark designed to evaluate the transferability of LLM jailbreak attacks to MLLMs, and fur…
☆84Updated 7 months ago
roywang021 / IDEATOR
Code for ICCV2025 paper——IDEATOR: Jailbreaking and Benchmarking Large Vision-Language Models Using Themselves
☆14Updated 5 months ago
CryptoAILab / FigStep
[AAAI'25 (Oral)] Jailbreaking Large Vision-language Models via Typographic Visual Prompts
☆182Updated 6 months ago
Haochen-Luo / CroPA
☆54Updated last year
abc03570128 / Jailbreaking-Attack-against-Multimodal-Large-Language-Model
☆54Updated last year
roywang021 / UMK
Code for ACM MM2024 paper: White-box Multimodal Jailbreaks Against Large Vision-Language Models
☆31Updated 11 months ago
TeamPigeonLab / CS-DJ
Accept by CVPR 2025 (highlight)
☆21Updated 6 months ago
thu-coai / JailbreakDefense_GoalPriority
[ACL 2024] Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization
☆29Updated last year
zihao-ai / unthinking_vulnerability
To Think or Not to Think: Exploring the Unthinking Vulnerability in Large Reasoning Models
☆32Updated 7 months ago
ASTRAL-Group / ASTRA
[CVPR 2025] Official implementation for "Steering Away from Harm: An Adaptive Approach to Defending Vision Language Model Against Jailbre…
☆47Updated 5 months ago
salman-lui / x-teaming
☆48Updated 7 months ago
thu-ml / MMTrustEval
A toolbox for benchmarking trustworthiness of multimodal large language models (MultiTrust, NeurIPS 2024 Track Datasets and Benchmarks)
☆173Updated 6 months ago
ys-zong / VLGuard
[ICML 2024] Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models.
☆81Updated 11 months ago
ericyinyzy / VLAttack
This is an official repository of ``VLAttack: Multimodal Adversarial Attacks on Vision-Language Tasks via Pre-trained Models'' (NeurIPS 2…
☆63Updated 9 months ago
Vinsonzyh / BlueSuffix
[ICLR 2025] BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacks
☆30Updated last month
naver-ai / JOOD
[CVPR 2025] Official implementation for JOOD "Playing the Fool: Jailbreaking LLMs and Multimodal LLMs with Out-of-Distribution Strategy"
☆19Updated 6 months ago
wonderNefelibata / Awesome-LRM-Safety
Awesome Large Reasoning Model(LRM) Safety.This repository is used to collect security-related research on large reasoning models such as …
☆78Updated this week
WangCheng0116 / Awesome-LRMs-Safety
Official repository for "Safety in Large Reasoning Models: A Survey" - Exploring safety risks, attacks, and defenses for Large Reasoning …
☆83Updated 4 months ago
PKU-ML / PAT
Code for NeurIPS 2024 Paper "Fight Back Against Jailbreaking via Prompt Adversarial Tuning"
☆22Updated 7 months ago
Dtc7w3PQ / Visco-Attack
Official implementation of Visco-Attack (EMNLP 2025 Main). We will progressively release the code and one-click reproduction scripts.
☆26Updated 4 months ago
XuanChen-xc / RLbreaker
Code for "When LLM Meets DRL: Advancing Jailbreaking Efficiency via DRL-guided Search" (NeurIPS 2024)
☆17Updated last year
NY1024 / Foundation-Model-Paper-Notes
☆71Updated 7 months ago
ybwang119 / label_recovery
[ICLR 2024] Towards Elminating Hard Label Constraints in Gradient Inverision Attacks
☆14Updated last year
AI-secure / AdvAgent
☆20Updated 7 months ago