AoiDragon / HADESLinks
[ECCV'24 Oral] The official GitHub page for ''Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models''
β34Updated 11 months ago
Alternatives and similar repositories for HADES
Users that are interested in HADES are comparing it to the libraries listed below
Sorting:
- [ICLR 2024 Spotlight π₯ ] - [ Best Paper Award SoCal NLP 2023 π] - Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modalβ¦β68Updated last year
- [CVPR2025] Official Repository for IMMUNE: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignmentβ22Updated 3 months ago
- One Prompt Word is Enough to Boost Adversarial Robustness for Pre-trained Vision-Language Modelsβ51Updated 8 months ago
- β47Updated 9 months ago
- β48Updated last year
- β37Updated last year
- This is an official repository of ``VLAttack: Multimodal Adversarial Attacks on Vision-Language Tasks via Pre-trained Models'' (NeurIPS 2β¦β57Updated 5 months ago
- [ICML 2024] Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models.β74Updated 7 months ago
- β61Updated 5 months ago
- [ECCV 2024] Official PyTorch Implementation of "How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs"β83Updated last year
- [ECCV'24 Oral] The official GitHub page for ''Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking β¦β28Updated 10 months ago
- β50Updated last year
- ECSO (Make MLLM safe without neither training nor any external models!) (https://arxiv.org/abs/2403.09572)β30Updated 10 months ago
- [ECCV 2024] The official code for "AdaShield: Safeguarding Multimodal Large Language Models from Structure-based Attack via Adaptive Shiβ¦β63Updated last year
- [AAAI'25 (Oral)] Jailbreaking Large Vision-language Models via Typographic Visual Promptsβ170Updated 2 months ago
- AnyDoor: Test-Time Backdoor Attacks on Multimodal Large Language Modelsβ57Updated last year
- [ICCV-2025] Universal Adversarial Attack, Multimodal Adversarial Attacks, VLP models, Contrastive Learning, Cross-modal Perturbation Geneβ¦β24Updated 2 months ago
- [ICML 2024] Unsupervised Adversarial Fine-Tuning of Vision Embeddings for Robust Large Vision-Language Modelsβ144Updated 3 months ago
- [CVPR 2025] Official implementation for "Steering Away from Harm: An Adaptive Approach to Defending Vision Language Model Against Jailbreβ¦β37Updated 2 months ago
- Accepted by ECCV 2024β151Updated 11 months ago
- [NeurIPS-2023] Annual Conference on Neural Information Processing Systemsβ212Updated 8 months ago
- A toolbox for benchmarking trustworthiness of multimodal large language models (MultiTrust, NeurIPS 2024 Track Datasets and Benchmarks)β166Updated 2 months ago
- β102Updated last year
- Code for Neurips 2024 paper "Shadowcast: Stealthy Data Poisoning Attacks Against Vision-Language Models"β53Updated 8 months ago
- Code for the paper "Jailbreak Large Vision-Language Models Through Multi-Modal Linkage"β16Updated 9 months ago
- β43Updated 2 years ago
- Official codebase for Image Hijacks: Adversarial Images can Control Generative Models at Runtimeβ50Updated last year
- β24Updated last year
- [NAACL 2025 Main] Official Implementation of MLLMU-Benchβ33Updated 6 months ago
- ECCV2024: Adversarial Prompt Tuning for Vision-Language Modelsβ27Updated 9 months ago