jiaxiaojunQAQ/OmniSafeBench-MM

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/jiaxiaojunQAQ/OmniSafeBench-MM)

jiaxiaojunQAQ / OmniSafeBench-MM

A Unified Benchmark and Toolbox for Multimodal Jailbreak Attack–Defense Evaluation

☆68

Alternatives and similar repositories for OmniSafeBench-MM

Users that are interested in OmniSafeBench-MM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Alibaba-AAIG / Oyster
View on GitHub
The Oyster series is a set of safety models developed in-house by Alibaba-AAIG, devoted to building a responsible AI ecosystem. | Oyster …
☆62Apr 29, 2026Updated last month
jiaxiaojunQAQ / FOA-Attack
View on GitHub
Adversarial Attacks against Closed-Source MLLMs via Feature Optimal Alignment (NeurIPS 2025)
☆65Nov 5, 2025Updated 6 months ago
jiaxiaojunQAQ / FP-Better
View on GitHub
Code for Fast Propagation is Better: Accelerating Single-Step Adversarial Training via Sampling Subnetworks (TIFS2024)
☆13Mar 29, 2024Updated 2 years ago
zhipeng-wei / EmojiAttack
View on GitHub
Emoji Attack [ICML 2025]
☆44Jul 15, 2025Updated 10 months ago
SproutNan / AI-Safety_Benchmark
View on GitHub
The official repository for guided jailbreak benchmark
☆29Jul 28, 2025Updated 10 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
yuplin2333 / representation-space-jailbreak
View on GitHub
Code repo of our paper Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis (https://arxiv.org/abs/2406.10794…
☆24Jul 26, 2024Updated last year
Huang-yihao / Personalization-based_backdoor
View on GitHub
☆12Dec 18, 2024Updated last year
jiaxiaojunQAQ / FGSM-PGK
View on GitHub
Improving fast adversarial training with prior-guided knowledge (TPAMI2024)
☆43Apr 21, 2024Updated 2 years ago
thefcraft / prompt-generator-stable-diffusion
View on GitHub
Prompt Generator model for Stable Diffusion Models
☆12Jun 20, 2023Updated 2 years ago
NY1024 / SafeBench
View on GitHub
☆22Oct 25, 2024Updated last year
jiaxiaojunQAQ / FGSM-LAW
View on GitHub
Revisiting and Exploring Efficient Fast Adversarial Training via LAW: Lipschitz Regularization and Auto Weight Averaging (TIFS2024)
☆37Jun 4, 2024Updated last year
clearloveclearlove / BEAT
View on GitHub
☆15Feb 26, 2025Updated last year
jiaxiaojunQAQ / I-GCG
View on GitHub
Improved techniques for optimization-based jailbreaking on large language models (ICLR2025)
☆146Apr 7, 2025Updated last year
fangjf1 / OpenSafeMLRM
View on GitHub
The first toolkit for MLRM safety evaluation, providing unified interface for mainstream models, datasets, and jailbreaking methods!
☆15Apr 8, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
MaTengSYSU / HIMRD-jailbreak
View on GitHub
Code repository for the paper "Heuristic Induced Multimodal Risk Distribution Jailbreak Attack for Multimodal Large Language Models"
☆17Aug 7, 2025Updated 9 months ago
Sandy-Zeng / NPAttack
View on GitHub
Pytorch implementation of NPAttack
☆12Jul 7, 2020Updated 5 years ago
jiaxiaojunQAQ / FGSM-SDI
View on GitHub
Code for Boosting fast adversarial training with learnable adversarial initialization (TIP2022)
☆29Aug 22, 2023Updated 2 years ago
jinghuichen / AWM
View on GitHub
Github repo for One-shot Neural Backdoor Erasing via Adversarial Weight Masking (NeurIPS 2022)
☆15Jan 3, 2023Updated 3 years ago
hanshen95 / penalized-bilevel-gradient-descent
View on GitHub
An implementation of the penalty-based bilevel gradient descent (PBGD) algorithm and the iterative differentiation (ITD/RHG) methods.
☆19Feb 13, 2023Updated 3 years ago
XuanChen-xc / RLbreaker
View on GitHub
Code for "When LLM Meets DRL: Advancing Jailbreaking Efficiency via DRL-guided Search" (NeurIPS 2024)
☆18Oct 22, 2024Updated last year
snu-mllab / Bayesian-Red-Teaming
View on GitHub
About Official PyTorch implementation of "Query-Efficient Black-Box Red Teaming via Bayesian Optimization" (ACL'23)
☆15Jul 9, 2023Updated 2 years ago
Nathangitlab / Backdoor-Attacks-on-Crowd-Counting
View on GitHub
this is for the ACM MM paper---Backdoor Attack on Crowd Counting
☆17Jul 10, 2022Updated 3 years ago
thu-ml / STAIR
View on GitHub
Official codebase for "STAIR: Improving Safety Alignment with Introspective Reasoning"
☆89Feb 26, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
IBM / SafeLoRA
View on GitHub
Github repo for NeurIPS 2024 paper "Safe LoRA: the Silver Lining of Reducing Safety Risks when Fine-tuning Large Language Models"
☆28Dec 21, 2025Updated 5 months ago
TeamPigeonLab / CS-DJ
View on GitHub
Accept by CVPR 2025 (highlight)
☆25Jun 8, 2025Updated 11 months ago
TRLou / HiT-ADV
View on GitHub
The code of "Hide in Thicket: Generating Imperceptible and Rational Adversarial Perturbations on 3D Point Clouds" CVPR 2024
☆36Mar 23, 2024Updated 2 years ago
itsvaibhav01 / Immune
View on GitHub
[CVPR2025] Official Repository for IMMUNE: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment
☆28Jun 11, 2025Updated 11 months ago
renjie3 / TUE
View on GitHub
Code for Transferable Unlearnable Examples
☆22Mar 11, 2023Updated 3 years ago
aimagelab / HySAC
View on GitHub
Hyperbolic Safety-Aware Vision-Language Models. CVPR 2025
☆30Apr 8, 2025Updated last year
TLMichael / Delusive-Adversary
View on GitHub
[NeurIPS 2021] Better Safe Than Sorry: Preventing Delusive Adversaries with Adversarial Training
☆32Jan 9, 2022Updated 4 years ago
RylanSchaeffer / AstraFellowship-When-Do-VLM-Image-Jailbreaks-Transfer
View on GitHub
Code for ICLR 2025 Failures to Find Transferable Image Jailbreaks Between Vision-Language Models
☆36Jun 1, 2025Updated 11 months ago
alenai97 / PEFT-MLLM
View on GitHub
Official Code and data for ACL 2024 finding, "An Empirical Study on Parameter-Efficient Fine-Tuning for MultiModal Large Language Models"
☆25Nov 10, 2024Updated last year
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
Saehyung-Lee / cifar10_challenge
View on GitHub
Code for the CVPR 2020 article "Adversarial Vertex mixup: Toward Better Adversarially Robust Generalization"
☆12Jul 13, 2020Updated 5 years ago
roywang021 / UMK
View on GitHub
Code for ACM MM2024 paper: White-box Multimodal Jailbreaks Against Large Vision-Language Models
☆32Dec 30, 2024Updated last year
thu-ml / Attack-Bard
View on GitHub
☆109Feb 16, 2024Updated 2 years ago
isXinLiu / MM-SafetyBench
View on GitHub
Accepted by ECCV 2024
☆206Oct 15, 2024Updated last year
SaFo-Lab / AdaShield
View on GitHub
[ECCV 2024] The official code for "AdaShield: Safeguarding Multimodal Large Language Models from Structure-based Attack via Adaptive Shi…
☆73Feb 9, 2026Updated 3 months ago
poloclub / complicit-splat
View on GitHub
3D Gaussian Splat Easily Attacked to Cause Harm
☆12Aug 5, 2025Updated 9 months ago
lingeringlight / SETA
View on GitHub
The official implementation for SETA (TIP 2024).
☆11Feb 17, 2025Updated last year