NY1024/SafeBench

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/NY1024/SafeBench)

NY1024 / SafeBench

☆22

Alternatives and similar repositories for SafeBench

Users that are interested in SafeBench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

SproutNan / AI-Safety_Benchmark
View on GitHub
The official repository for guided jailbreak benchmark
☆31Jul 28, 2025Updated 11 months ago
DripNowhy / ETA
View on GitHub
[ICLR 2025] PyTorch Implementation of "ETA: Evaluating Then Aligning Safety of Vision Language Models at Inference Time"
☆34Jul 20, 2025Updated last year
CryptoAILab / FigStep
View on GitHub
[AAAI'25 (Oral)] Jailbreaking Large Vision-language Models via Typographic Visual Prompts
☆212Jun 26, 2025Updated last year
ys-zong / VLGuard
View on GitHub
[ICML 2024] Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models.
☆90Jan 19, 2025Updated last year
isXinLiu / MM-SafetyBench
View on GitHub
Accepted by ECCV 2024
☆218Oct 15, 2024Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
AI45Lab / MLLMGuard
View on GitHub
☆46Jun 19, 2025Updated last year
EchoseChen / SPA-VL-RLHF
View on GitHub
The reinforcement learning codes for dataset SPA-VL
☆48Jun 24, 2024Updated 2 years ago
paul-rottger / msts-multimodal-safety
View on GitHub
Röttger et al. (2025): "MSTS: A Multimodal Safety Test Suite for Vision-Language Models"
☆20Mar 31, 2025Updated last year
shiningrain / JailGuard
View on GitHub
☆32Mar 16, 2025Updated last year
thunxxx / MLLM-Jailbreak-evaluation-MMJ-Bench
View on GitHub
☆81Mar 30, 2025Updated last year
kigb / DropoutDecoding
View on GitHub
[NeurIPS 2025] Official Implementation for "Enhancing Vision-Language Model Reliability with Uncertainty-Guided Dropout Decoding"
☆22Dec 8, 2024Updated last year
Vinsonzyh / BlueSuffix
View on GitHub
[ICLR 2025] BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacks
☆31Nov 2, 2025Updated 8 months ago
RUCAIBox / HADES
View on GitHub
[ECCV'24 Oral] The official GitHub page for ''Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking …
☆39Oct 23, 2024Updated last year
xirui-li / DrAttack
View on GitHub
Official implementation of paper: DrAttack: Prompt Decomposition and Reconstruction Makes Powerful LLM Jailbreakers
☆68Aug 25, 2024Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
TeamPigeonLab / CS-DJ
View on GitHub
Accept by CVPR 2025 (highlight)
☆25Jun 8, 2025Updated last year
yuplin2333 / representation-space-jailbreak
View on GitHub
Code repo of our paper Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis (https://arxiv.org/abs/2406.10794…
☆24Jul 26, 2024Updated last year
yueliu1999 / GuardReasoner-VL
View on GitHub
[NeurIPS 2025] An official source code for paper "GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning".
☆123Feb 22, 2026Updated 4 months ago
Leey21 / CipherBank
View on GitHub
☆13Jun 13, 2025Updated last year
LIONS-EPFL / Charmer
View on GitHub
Revisiting Character-level Adversarial Attacks for Language Models, ICML 2024
☆19Feb 12, 2025Updated last year
TsinghuaC3I / FS-GEN
View on GitHub
Fast and Slow Generating: An Empirical Study on Large and Small Language Models Collaborative Decoding.
☆13Nov 19, 2024Updated last year
NY1024 / BAP-Jailbreak-Vision-Language-Models-via-Bi-Modal-Adversarial-Prompt
View on GitHub
☆61Jun 5, 2024Updated 2 years ago
UCSB-AI / MSSBench
View on GitHub
[ICLR 2025] Official codebase for the ICLR 2025 paper "Multimodal Situational Safety"
☆36Jun 23, 2025Updated last year
IBM / Adversarial-Prompt-Evaluation
View on GitHub
Code Implementation of Adversarial Prompt Evaluation paper
☆14Sep 18, 2025Updated 10 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
thefcraft / prompt-generator-stable-diffusion
View on GitHub
Prompt Generator model for Stable Diffusion Models
☆12Jun 20, 2023Updated 3 years ago
Longin-Yu / ComRoPE
View on GitHub
☆11Jun 11, 2025Updated last year
Dtc7w3PQ / Visco-Attack
View on GitHub
Official implementation of Visco-Attack (EMNLP 2025 Main). An open-source one-click reproduction script is also provided.
☆30Apr 11, 2026Updated 3 months ago
dsbowen / strong_reject
View on GitHub
☆146Jul 7, 2025Updated last year
mbzuai-nlp / AudioJailbreak
View on GitHub
Audio Jailbreak: An Open Comprehensive Benchmark for Jailbreaking Large Audio-Language Models
☆32Oct 6, 2025Updated 9 months ago
SaFo-Lab / JailBreakV_28K
View on GitHub
[COLM 2024] JailBreakV-28K: A comprehensive benchmark designed to evaluate the transferability of LLM jailbreak attacks to MLLMs, and fur…
☆96May 9, 2025Updated last year
UCSB-AI / SafeKey
View on GitHub
[EMNLP 2025] Official code for the paper "SafeKey: Amplifying Aha-Moment Insights for Safety Reasoning"
☆16May 12, 2026Updated 2 months ago
claws-lab / casper
View on GitHub
Code and data for the ACM CIKM 2022 paper "Rank List Sensitivity of Recommender Systems to Interaction Perturbations"
☆10Aug 16, 2022Updated 3 years ago
sejoonoh / ATR
View on GitHub
Code and data for the ACM CIKM 2024 paper "Adversarial Text Rewriting for Text-aware Recommender Systems"
☆12Aug 1, 2024Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
AKuzina / defend_vae_mcmc
View on GitHub
Code repository of the paper "Alleviating Adversarial Attacks on Variational Autoencoders with MCMC" published at NeurIPS 2022. https://a…
☆10Dec 14, 2022Updated 3 years ago
dragonlzm / PAVE
View on GitHub
This repo holds the implementation of PAVE: Patching and Adapting Video Large Language Models (CVPR2025)
☆27Sep 6, 2025Updated 10 months ago
zhaoyiran924 / Probe-Sampling
View on GitHub
[NeurIPS 2024] Accelerating Greedy Coordinate Gradient and General Prompt Optimization via Probe Sampling
☆35Nov 8, 2024Updated last year
Liar-Mask / FedMIA
View on GitHub
☆20Jan 26, 2025Updated last year
RylanSchaeffer / AstraFellowship-When-Do-VLM-Image-Jailbreaks-Transfer
View on GitHub
Code for ICLR 2025 Failures to Find Transferable Image Jailbreaks Between Vision-Language Models
☆37Jun 1, 2025Updated last year
theshi-1128 / ReDPJ
View on GitHub
A novel jailbreak attack unveiling an overlooked attack surface inherently in the chain-of-thought reasoning trajectory of LLMs
☆22Apr 3, 2026Updated 3 months ago
cnut1648 / Model-Fingerprint
View on GitHub
Fingerprint large language models
☆52Jul 11, 2024Updated 2 years ago