Repository for the Paper: Refusing Safe Prompts for Multi-modal Large Language Models
☆18Oct 16, 2024Updated last year
Alternatives and similar repositories for MLLM-Refusal
Users that are interested in MLLM-Refusal are comparing it to the libraries listed below
Sorting:
- ☆14Jun 6, 2023Updated 2 years ago
- An Embarrassingly Simple Backdoor Attack on Self-supervised Learning☆20Jan 24, 2024Updated 2 years ago
- The repo for paper: Exploiting the Index Gradients for Optimization-Based Jailbreaking on Large Language Models.☆13Dec 16, 2024Updated last year
- ☆21Mar 20, 2025Updated 11 months ago
- ☆21Jul 25, 2025Updated 7 months ago
- ☆22Dec 14, 2023Updated 2 years ago
- Distribution Preserving Backdoor Attack in Self-supervised Learning☆20Jan 27, 2024Updated 2 years ago
- [ICCV 2023] HybridAugment++: Unified Frequency Spectra Perturbations for Model Robustness☆17Sep 28, 2023Updated 2 years ago
- Code for our ICLR 2023 paper Making Substitute Models More Bayesian Can Enhance Transferability of Adversarial Examples.☆18May 31, 2023Updated 2 years ago
- ☆10Aug 19, 2024Updated last year
- ☆23Oct 25, 2024Updated last year
- ☆52Feb 8, 2025Updated last year
- ☆59Jun 5, 2024Updated last year
- A package that achieves 95%+ transfer attack success rate against GPT-4☆26Oct 24, 2024Updated last year
- Code for ICLR 2025 Failures to Find Transferable Image Jailbreaks Between Vision-Language Models☆37Jun 1, 2025Updated 9 months ago
- Accepted by IJCAI-24 Survey Track☆231Aug 25, 2024Updated last year
- Awesome Jailbreak, red teaming arxiv papers (Automatically Update Every 12th hours)☆98Updated this week
- ☆11Dec 23, 2024Updated last year
- Code repository for the paper --- [USENIX Security 2023] Towards A Proactive ML Approach for Detecting Backdoor Poison Samples☆30Jul 11, 2023Updated 2 years ago
- ☆35Feb 5, 2024Updated 2 years ago
- A collection of resources on attacks and defenses targeting text-to-image diffusion models☆92Dec 20, 2025Updated 2 months ago
- [ICML 2024] Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models.☆85Jan 19, 2025Updated last year
- The MobSTr dataset provides artifacts that demonstrate Model-based Safety Assurance and Traceability for a safety-critical automotive sys…☆10Mar 18, 2022Updated 3 years ago
- ☆12May 6, 2022Updated 3 years ago
- (ICLR 2026 🔥) Code for "The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs"☆74Feb 9, 2026Updated 3 weeks ago
- APBench: A Unified Availability Poisoning Attack and Defenses Benchmark (TMLR 08/2024)☆46Apr 15, 2025Updated 10 months ago
- Repository for the Paper (AAAI 2024, Oral) --- Visual Adversarial Examples Jailbreak Large Language Models☆266May 13, 2024Updated last year
- Reconstructive Neuron Pruning for Backdoor Defense (ICML 2023)☆39Dec 24, 2023Updated 2 years ago
- ☆48Apr 7, 2025Updated 10 months ago
- BrainWash: A Poisoning Attack to Forget in Continual Learning☆12Apr 15, 2024Updated last year
- On the Robustness of GUI Grounding Models Against Image Attacks☆12Apr 8, 2025Updated 10 months ago
- Math24o: 高中奥林匹克数学竞赛测评集 High School Olympiad Mathematics Chinese Benchmark☆11Mar 27, 2025Updated 11 months ago
- [NeurIPS 2025] The official implementation of the paper "DRIFT: Dynamic Rule-Based Defense with Injection Isolation for Securing LLM Agen…☆39Feb 14, 2026Updated 2 weeks ago
- [ICCV 2023] "TRM-UAP: Enhancing the Transferability of Data-Free Universal Adversarial Perturbation via Truncated Ratio Maximization", Yi…☆13Jul 17, 2024Updated last year
- Code&Data for the paper "Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents" [NeurIPS 2024]☆109Sep 27, 2024Updated last year
- Edit Away and My Face Will not Stay: Personal Biometric Defense against Malicious Generative Editing☆54Dec 17, 2024Updated last year
- [ICLR 2025] Dissecting adversarial robustness of multimodal language model agents☆124Feb 19, 2025Updated last year
- This is the code repository for "Uncovering Safety Risks of Large Language Models through Concept Activation Vector"☆47Oct 13, 2025Updated 4 months ago
- Official code for "Rethinking Chain-of-Thought Reasoning for Videos"☆20Dec 14, 2025Updated 2 months ago