AoiDragon / HADESLinks
[ECCV'24 Oral] The official GitHub page for ''Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models''
β36Updated last year
Alternatives and similar repositories for HADES
Users that are interested in HADES are comparing it to the libraries listed below
Sorting:
- [ICLR 2024 Spotlight π₯ ] - [ Best Paper Award SoCal NLP 2023 π] - Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modalβ¦β73Updated last year
- [CVPR2025] Official Repository for IMMUNE: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignmentβ24Updated 4 months ago
- [ICML 2024] Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models.β77Updated 9 months ago
- β63Updated 7 months ago
- β52Updated 10 months ago
- This is an official repository of ``VLAttack: Multimodal Adversarial Attacks on Vision-Language Tasks via Pre-trained Models'' (NeurIPS 2β¦β58Updated 7 months ago
- [ECCV 2024] Official PyTorch Implementation of "How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs"β83Updated last year
- One Prompt Word is Enough to Boost Adversarial Robustness for Pre-trained Vision-Language Modelsβ56Updated 10 months ago
- β38Updated last year
- β48Updated last year
- β53Updated last year
- ECSO (Make MLLM safe without neither training nor any external models!) (https://arxiv.org/abs/2403.09572)β33Updated 11 months ago
- [ECCV 2024] The official code for "AdaShield: Safeguarding Multimodal Large Language Models from Structure-based Attack via Adaptive Shiβ¦β65Updated last year
- [NeurIPS-2023] Annual Conference on Neural Information Processing Systemsβ216Updated 10 months ago
- [ICML 2024] Unsupervised Adversarial Fine-Tuning of Vision Embeddings for Robust Large Vision-Language Modelsβ147Updated 4 months ago
- [ECCV'24 Oral] The official GitHub page for ''Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking β¦β30Updated last year
- AnyDoor: Test-Time Backdoor Attacks on Multimodal Large Language Modelsβ58Updated last year
- [ICCV-2025] Universal Adversarial Attack, Multimodal Adversarial Attacks, VLP models, Contrastive Learning, Cross-modal Perturbation Geneβ¦β27Updated 3 months ago
- [CVPR 2025] Official implementation for "Steering Away from Harm: An Adaptive Approach to Defending Vision Language Model Against Jailbreβ¦β43Updated 3 months ago
- [AAAI'25 (Oral)] Jailbreaking Large Vision-language Models via Typographic Visual Promptsβ177Updated 4 months ago
- β105Updated last year
- Code for Neurips 2024 paper "Shadowcast: Stealthy Data Poisoning Attacks Against Vision-Language Models"β55Updated 9 months ago
- Official codebase for Image Hijacks: Adversarial Images can Control Generative Models at Runtimeβ51Updated 2 years ago
- β43Updated 2 years ago
- Accepted by ECCV 2024β165Updated last year
- Set-level Guidance Attack: Boosting Adversarial Transferability of Vision-Language Pre-training Models. [ICCV 2023 Oral]β65Updated 2 years ago
- Code for ACM MM2024 paper: White-box Multimodal Jailbreaks Against Large Vision-Language Modelsβ30Updated 10 months ago
- β25Updated last year
- [ICLR 2025] BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacksβ27Updated 6 months ago
- A package that achieves 95%+ transfer attack success rate against GPT-4β23Updated last year