☆22Oct 25, 2024Updated last year
Alternatives and similar repositories for SafeBench
Users that are interested in SafeBench are comparing it to the libraries listed below
Sorting:
- [ICLR 2025] PyTorch Implementation of "ETA: Evaluating Then Aligning Safety of Vision Language Models at Inference Time"☆30Jul 20, 2025Updated 8 months ago
- ☆26Mar 17, 2025Updated last year
- The official repository for guided jailbreak benchmark☆29Jul 28, 2025Updated 7 months ago
- [ICLR 2025] BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacks☆31Nov 2, 2025Updated 4 months ago
- Code repo of our paper Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis (https://arxiv.org/abs/2406.10794…☆23Jul 26, 2024Updated last year
- ☆45Jun 19, 2025Updated 9 months ago
- ☆27Jun 5, 2024Updated last year
- ☆76Mar 30, 2025Updated 11 months ago
- A Unified Benchmark and Toolbox for Multimodal Jailbreak Attack–Defense Evaluation☆62Mar 2, 2026Updated 2 weeks ago
- Fast and Slow Generating: An Empirical Study on Large and Small Language Models Collaborative Decoding.☆13Nov 19, 2024Updated last year
- [ECCV'24 Oral] The official GitHub page for ''Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking …☆38Oct 17, 2024Updated last year
- [ICLR 2024 Spotlight 🔥 ] - [ Best Paper Award SoCal NLP 2023 🏆] - Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal…☆80Jun 6, 2024Updated last year
- [CVPR 2025] Official implementation for "Steering Away from Harm: An Adaptive Approach to Defending Vision Language Model Against Jailbre…☆55Jul 5, 2025Updated 8 months ago
- Code to the paper: The Geometry of Refusal in Large Language Models: Concept Cones and Representational Independence☆26Jul 31, 2025Updated 7 months ago
- [ICLR 2025] Official codebase for the ICLR 2025 paper "Multimodal Situational Safety"☆32Jun 23, 2025Updated 8 months ago
- Prompt Generator model for Stable Diffusion Models☆12Jun 20, 2023Updated 2 years ago
- ☆30Dec 14, 2025Updated 3 months ago
- The official repository for paper "MLLM-Protector: Ensuring MLLM’s Safety without Hurting Performance"☆45Apr 21, 2024Updated last year
- [ICML 2024] Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models.☆86Jan 19, 2025Updated last year
- Improved techniques for optimization-based jailbreaking on large language models (ICLR2025)☆141Apr 7, 2025Updated 11 months ago
- Röttger et al. (2025): "MSTS: A Multimodal Safety Test Suite for Vision-Language Models"☆16Mar 31, 2025Updated 11 months ago
- codes for ICML2021 paper iDARTS: Differentiable Architecture Search with Stochastic Implicit Gradients☆10May 27, 2021Updated 4 years ago
- [ICML 2024] Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications☆89Mar 30, 2025Updated 11 months ago
- ☆14Jun 4, 2025Updated 9 months ago
- Explore, Establish, Exploit: Red Teaming Language Models from Scratch☆13Jun 21, 2023Updated 2 years ago
- Teaching a Convolutional Neural Network to recognize painting genre. Handcrafted dataset. Cool visualizations.☆10Dec 19, 2018Updated 7 years ago
- Code and data for the ACM CIKM 2022 paper "Rank List Sensitivity of Recommender Systems to Interaction Perturbations"☆10Aug 16, 2022Updated 3 years ago
- Code and data for the ACM CIKM 2024 paper "Adversarial Text Rewriting for Text-aware Recommender Systems"☆12Aug 1, 2024Updated last year
- Generating Potent Poisons and Backdoors from Scratch with Guided Diffusion☆11Apr 1, 2024Updated last year
- ☆60Jun 5, 2024Updated last year
- Tensorflow implementation of TrialAttack (Triple Adversarial Learning for Influence based Poisoning Attack in Recommender Systems. KDD 20…☆12Sep 2, 2021Updated 4 years ago
- A list of research towards security&privacy in AI-Generated Content☆16Jan 10, 2025Updated last year
- Adversarial Item Promotion in visually-aware recommenders☆16Sep 3, 2021Updated 4 years ago
- FamilyTool benchmark☆13Sep 10, 2025Updated 6 months ago
- Code for Fast Propagation is Better: Accelerating Single-Step Adversarial Training via Sampling Subnetworks (TIFS2024)☆13Mar 29, 2024Updated last year
- This repo contains the code for studying the interplay between quantization and sparsity methods☆26Feb 26, 2025Updated last year
- AN INTERACTIVE REMOTE SENSING CHANGE ANALYSIS MODEL BASED ON MULTIMODAL INSTRUCTION TUNING☆21Jun 16, 2025Updated 9 months ago
- 关于behance爬虫项目☆10May 16, 2019Updated 6 years ago
- [KDD'21] Official PyTorch implementation for "Data Poisoning Attack against Recommender System Using Incomplete and Perturbed Data".☆13Sep 19, 2021Updated 4 years ago