AoiDragon / HADESLinks
[ECCV'24 Oral] The official GitHub page for ''Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models''
β36Updated 11 months ago
Alternatives and similar repositories for HADES
Users that are interested in HADES are comparing it to the libraries listed below
Sorting:
- [ICLR 2024 Spotlight π₯ ] - [ Best Paper Award SoCal NLP 2023 π] - Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modalβ¦β70Updated last year
- [ICML 2024] Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models.β76Updated 8 months ago
- [CVPR2025] Official Repository for IMMUNE: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignmentβ23Updated 3 months ago
- β48Updated last year
- [ECCV 2024] Official PyTorch Implementation of "How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs"β83Updated last year
- One Prompt Word is Enough to Boost Adversarial Robustness for Pre-trained Vision-Language Modelsβ53Updated 9 months ago
- β50Updated 10 months ago
- β52Updated last year
- β63Updated 6 months ago
- ECSO (Make MLLM safe without neither training nor any external models!) (https://arxiv.org/abs/2403.09572)β31Updated 11 months ago
- This is an official repository of ``VLAttack: Multimodal Adversarial Attacks on Vision-Language Tasks via Pre-trained Models'' (NeurIPS 2β¦β57Updated 6 months ago
- β38Updated last year
- AnyDoor: Test-Time Backdoor Attacks on Multimodal Large Language Modelsβ57Updated last year
- [ECCV'24 Oral] The official GitHub page for ''Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking β¦β29Updated 11 months ago
- [ECCV 2024] The official code for "AdaShield: Safeguarding Multimodal Large Language Models from Structure-based Attack via Adaptive Shiβ¦β64Updated last year
- [ICML 2024] Unsupervised Adversarial Fine-Tuning of Vision Embeddings for Robust Large Vision-Language Modelsβ145Updated 4 months ago
- [AAAI'25 (Oral)] Jailbreaking Large Vision-language Models via Typographic Visual Promptsβ173Updated 3 months ago
- [CVPR 2025] Official implementation for "Steering Away from Harm: An Adaptive Approach to Defending Vision Language Model Against Jailbreβ¦β39Updated 3 months ago
- [NeurIPS-2023] Annual Conference on Neural Information Processing Systemsβ213Updated 9 months ago
- Official codebase for Image Hijacks: Adversarial Images can Control Generative Models at Runtimeβ50Updated 2 years ago
- [ICCV-2025] Universal Adversarial Attack, Multimodal Adversarial Attacks, VLP models, Contrastive Learning, Cross-modal Perturbation Geneβ¦β25Updated 2 months ago
- β101Updated last year
- Accepted by ECCV 2024β156Updated 11 months ago
- An implementation for MLLM oversensitivity evaluationβ14Updated 10 months ago
- A package that achieves 95%+ transfer attack success rate against GPT-4β23Updated 11 months ago
- Code for Neurips 2024 paper "Shadowcast: Stealthy Data Poisoning Attacks Against Vision-Language Models"β55Updated 8 months ago
- [NAACL 2025 Main] Official Implementation of MLLMU-Benchβ34Updated 6 months ago
- β24Updated last year
- Code for the paper "Jailbreak Large Vision-Language Models Through Multi-Modal Linkage"β18Updated 10 months ago
- β43Updated 2 years ago