☆22Oct 25, 2024Updated last year
Alternatives and similar repositories for SafeBench
Users that are interested in SafeBench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [ICLR 2025] PyTorch Implementation of "ETA: Evaluating Then Aligning Safety of Vision Language Models at Inference Time"☆31Jul 20, 2025Updated 9 months ago
- ☆69Jun 1, 2025Updated 11 months ago
- ☆27Mar 17, 2025Updated last year
- ECSO (Make MLLM safe without neither training nor any external models!) (https://arxiv.org/abs/2403.09572)☆36Nov 2, 2024Updated last year
- [NeurIPS 2025] Official Implementation for "Enhancing Vision-Language Model Reliability with Uncertainty-Guided Dropout Decoding"☆22Dec 8, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- [ICLR 2025] BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacks☆31Nov 2, 2025Updated 5 months ago
- Code repo of our paper Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis (https://arxiv.org/abs/2406.10794…☆24Jul 26, 2024Updated last year
- ☆45Jun 19, 2025Updated 10 months ago
- ☆27Jun 5, 2024Updated last year
- Fast and Slow Generating: An Empirical Study on Large and Small Language Models Collaborative Decoding.☆13Nov 19, 2024Updated last year
- ☆76Mar 30, 2025Updated last year
- [ECCV'24 Oral] The official GitHub page for ''Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking …☆40Oct 17, 2024Updated last year
- Accepted by ECCV 2024☆203Oct 15, 2024Updated last year
- [ICLR 2024 Spotlight 🔥 ] - [ Best Paper Award SoCal NLP 2023 🏆] - Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal…☆81Jun 6, 2024Updated last year
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- [CVPR 2025] Official implementation for "Steering Away from Harm: An Adaptive Approach to Defending Vision Language Model Against Jailbre…☆60Jul 5, 2025Updated 9 months ago
- Code for ICCV2025 paper——IDEATOR: Jailbreaking and Benchmarking Large Vision-Language Models Using Themselves☆17Jul 11, 2025Updated 9 months ago
- [ICLR 2025] Official codebase for the ICLR 2025 paper "Multimodal Situational Safety"☆33Jun 23, 2025Updated 10 months ago
- Prompt Generator model for Stable Diffusion Models☆12Jun 20, 2023Updated 2 years ago
- The official repository for paper "MLLM-Protector: Ensuring MLLM’s Safety without Hurting Performance"☆46Apr 21, 2024Updated 2 years ago
- [ICML 2024] Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models.☆87Jan 19, 2025Updated last year
- Weakly Supervised Gaussian Contrastive Grounding with Large Multimodal Models for Video Question Answering [ACM MM'24]☆10Jul 22, 2024Updated last year
- ☆31Dec 14, 2025Updated 4 months ago
- Improved techniques for optimization-based jailbreaking on large language models (ICLR2025)☆144Apr 7, 2025Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- [ICML 2024] Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications☆90Mar 30, 2025Updated last year
- codes for ICML2021 paper iDARTS: Differentiable Architecture Search with Stochastic Implicit Gradients☆10May 27, 2021Updated 4 years ago
- Teaching a Convolutional Neural Network to recognize painting genre. Handcrafted dataset. Cool visualizations.☆10Dec 19, 2018Updated 7 years ago
- template for https://cnli.me☆10Feb 27, 2025Updated last year
- Code and data for the ACM CIKM 2022 paper "Rank List Sensitivity of Recommender Systems to Interaction Perturbations"☆10Aug 16, 2022Updated 3 years ago
- Code and data for the ACM CIKM 2024 paper "Adversarial Text Rewriting for Text-aware Recommender Systems"☆12Aug 1, 2024Updated last year
- [TOIS'24] "RecRanker: Instruction Tuning Large Language Model as Ranker for Top-k Recommendation"☆16Dec 1, 2024Updated last year
- Generating Potent Poisons and Backdoors from Scratch with Guided Diffusion☆11Apr 1, 2024Updated 2 years ago
- Explore, Establish, Exploit: Red Teaming Language Models from Scratch☆15Jun 21, 2023Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Tensorflow implementation of TrialAttack (Triple Adversarial Learning for Influence based Poisoning Attack in Recommender Systems. KDD 20…☆12Sep 2, 2021Updated 4 years ago
- Adversarial Item Promotion in visually-aware recommenders☆17Sep 3, 2021Updated 4 years ago
- Röttger et al. (2025): "MSTS: A Multimodal Safety Test Suite for Vision-Language Models"☆17Mar 31, 2025Updated last year
- The first toolkit for MLRM safety evaluation, providing unified interface for mainstream models, datasets, and jailbreaking methods!☆15Apr 8, 2025Updated last year
- ☆25Apr 10, 2025Updated last year
- Code implementation of R^2-Guard: Robust Reasoning Enabled LLM Guardrail via Knowledge-Enhanced Logical Reasoning☆22Jul 8, 2024Updated last year
- Code for Fast Propagation is Better: Accelerating Single-Step Adversarial Training via Sampling Subnetworks (TIFS2024)☆13Mar 29, 2024Updated 2 years ago