☆22Oct 25, 2024Updated last year
Alternatives and similar repositories for SafeBench
Users that are interested in SafeBench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [ICLR 2025] PyTorch Implementation of "ETA: Evaluating Then Aligning Safety of Vision Language Models at Inference Time"☆34Jul 20, 2025Updated 10 months ago
- ☆70Jun 1, 2025Updated last year
- ☆27Mar 17, 2025Updated last year
- The official repository for guided jailbreak benchmark☆29Jul 28, 2025Updated 10 months ago
- [NeurIPS 2025] Official Implementation for "Enhancing Vision-Language Model Reliability with Uncertainty-Guided Dropout Decoding"☆22Dec 8, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- [ICLR 2025] BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacks☆31Nov 2, 2025Updated 7 months ago
- ☆45Jun 19, 2025Updated 11 months ago
- ☆27Jun 5, 2024Updated 2 years ago
- Fast and Slow Generating: An Empirical Study on Large and Small Language Models Collaborative Decoding.☆13Nov 19, 2024Updated last year
- ☆79Mar 30, 2025Updated last year
- Accepted by ECCV 2024☆208Oct 15, 2024Updated last year
- [ICLR 2024 Spotlight 🔥 ] - [ Best Paper Award SoCal NLP 2023 🏆] - Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal…☆86Jun 6, 2024Updated 2 years ago
- [AAAI'25 (Oral)] Jailbreaking Large Vision-language Models via Typographic Visual Prompts☆207Jun 26, 2025Updated 11 months ago
- [CVPR 2025] Official implementation for "Steering Away from Harm: An Adaptive Approach to Defending Vision Language Model Against Jailbre…☆61Jul 5, 2025Updated 11 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- A Unified Benchmark and Toolbox for Multimodal Jailbreak Attack–Defense Evaluation☆70May 8, 2026Updated last month
- Code to the paper: The Geometry of Refusal in Large Language Models: Concept Cones and Representational Independence☆32Jul 31, 2025Updated 10 months ago
- [ICLR 2025] Official codebase for the ICLR 2025 paper "Multimodal Situational Safety"☆35Jun 23, 2025Updated 11 months ago
- Code for "ECoFLaP: Efficient Coarse-to-Fine Layer-Wise Pruning for Vision-Language Models" (ICLR 2024)☆21Feb 16, 2024Updated 2 years ago
- [ICML 2024] Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models.☆89Jan 19, 2025Updated last year
- The official repository for paper "MLLM-Protector: Ensuring MLLM’s Safety without Hurting Performance"☆46Apr 21, 2024Updated 2 years ago
- Weakly Supervised Gaussian Contrastive Grounding with Large Multimodal Models for Video Question Answering [ACM MM'24]☆10Jul 22, 2024Updated last year
- ☆35Dec 14, 2025Updated 5 months ago
- Improved techniques for optimization-based jailbreaking on large language models (ICLR2025)☆145Apr 7, 2025Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- [ICML 2024] Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications☆89Mar 30, 2025Updated last year
- codes for ICML2021 paper iDARTS: Differentiable Architecture Search with Stochastic Implicit Gradients☆10May 27, 2021Updated 5 years ago
- template for https://cnli.me☆10Feb 27, 2025Updated last year
- Code and data for the ACM CIKM 2022 paper "Rank List Sensitivity of Recommender Systems to Interaction Perturbations"☆10Aug 16, 2022Updated 3 years ago
- Code and data for the ACM CIKM 2024 paper "Adversarial Text Rewriting for Text-aware Recommender Systems"☆12Aug 1, 2024Updated last year
- [TOIS'24] "RecRanker: Instruction Tuning Large Language Model as Ranker for Top-k Recommendation"☆16Dec 1, 2024Updated last year
- Generating Potent Poisons and Backdoors from Scratch with Guided Diffusion☆11Apr 1, 2024Updated 2 years ago
- A simple academic poster template using pptx, for NeurIPS/ICML/ICCV.☆29Aug 1, 2025Updated 10 months ago
- Explore, Establish, Exploit: Red Teaming Language Models from Scratch☆15Jun 21, 2023Updated 2 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Tensorflow implementation of TrialAttack (Triple Adversarial Learning for Influence based Poisoning Attack in Recommender Systems. KDD 20…☆12Sep 2, 2021Updated 4 years ago
- Adversarial Item Promotion in visually-aware recommenders☆17Sep 3, 2021Updated 4 years ago
- Röttger et al. (2025): "MSTS: A Multimodal Safety Test Suite for Vision-Language Models"☆19Mar 31, 2025Updated last year
- ☆25Apr 10, 2025Updated last year
- Finite Element Analysis for Tactile Sensing (FEATS)☆27Oct 1, 2025Updated 8 months ago
- [ICLR 2025] Code implementation of R^2-Guard: Robust Reasoning Enabled LLM Guardrail via Knowledge-Enhanced Logical Reasoning☆23Jul 8, 2024Updated last year
- [ECCV 2024] Official PyTorch Implementation of "How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs"☆88Nov 28, 2023Updated 2 years ago