AoiDragon / HADESLinks
[ECCV'24 Oral] The official GitHub page for ''Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models''
β36Updated last year
Alternatives and similar repositories for HADES
Users that are interested in HADES are comparing it to the libraries listed below
Sorting:
- [ICLR 2024 Spotlight π₯ ] - [ Best Paper Award SoCal NLP 2023 π] - Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modalβ¦β77Updated last year
- [ICML 2024] Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models.β81Updated 10 months ago
- β54Updated last year
- β52Updated last year
- β65Updated 8 months ago
- One Prompt Word is Enough to Boost Adversarial Robustness for Pre-trained Vision-Language Modelsβ56Updated 11 months ago
- This is an official repository of ``VLAttack: Multimodal Adversarial Attacks on Vision-Language Tasks via Pre-trained Models'' (NeurIPS 2β¦β63Updated 8 months ago
- β52Updated last year
- [CVPR2025] Official Repository for IMMUNE: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignmentβ25Updated 6 months ago
- ECSO (Make MLLM safe without neither training nor any external models!) (https://arxiv.org/abs/2403.09572)β34Updated last year
- [ECCV'24 Oral] The official GitHub page for ''Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking β¦β32Updated last year
- β40Updated last year
- [ECCV 2024] Official PyTorch Implementation of "How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs"β84Updated 2 years ago
- [ICML 2024] Unsupervised Adversarial Fine-Tuning of Vision Embeddings for Robust Large Vision-Language Modelsβ148Updated 6 months ago
- [AAAI'25 (Oral)] Jailbreaking Large Vision-language Models via Typographic Visual Promptsβ181Updated 5 months ago
- [ECCV 2024] The official code for "AdaShield: Safeguarding Multimodal Large Language Models from Structure-based Attack via Adaptive Shiβ¦β67Updated last year
- AnyDoor: Test-Time Backdoor Attacks on Multimodal Large Language Modelsβ60Updated last year
- β26Updated last year
- Accepted by ECCV 2024β176Updated last year
- Code for Neurips 2024 paper "Shadowcast: Stealthy Data Poisoning Attacks Against Vision-Language Models"β57Updated 10 months ago
- β107Updated last year
- A package that achieves 95%+ transfer attack success rate against GPT-4β25Updated last year
- A toolbox for benchmarking trustworthiness of multimodal large language models (MultiTrust, NeurIPS 2024 Track Datasets and Benchmarks)β172Updated 5 months ago
- An implementation for MLLM oversensitivity evaluationβ17Updated last year
- [ICCV-2025] Universal Adversarial Attack, Multimodal Adversarial Attacks, VLP models, Contrastive Learning, Cross-modal Perturbation Geneβ¦β31Updated 5 months ago
- Code for ACM MM2024 paper: White-box Multimodal Jailbreaks Against Large Vision-Language Modelsβ30Updated 11 months ago
- [ICLR 2024] Inducing High Energy-Latency of Large Vision-Language Models with Verbose Imagesβ41Updated last year
- [CVPR 2025] Official implementation for "Steering Away from Harm: An Adaptive Approach to Defending Vision Language Model Against Jailbreβ¦β45Updated 5 months ago
- [NeurIPS-2023] Annual Conference on Neural Information Processing Systemsβ222Updated 11 months ago
- The first toolkit for MLRM safety evaluation, providing unified interface for mainstream models, datasets, and jailbreaking methods!β14Updated 8 months ago