ASTRAL-Group / ASTRALinks
[CVPR 2025] Official implementation for "Steering Away from Harm: An Adaptive Approach to Defending Vision Language Model Against Jailbreaks"
β24Updated last month
Alternatives and similar repositories for ASTRA
Users that are interested in ASTRA are comparing it to the libraries listed below
Sorting:
- β47Updated 2 months ago
- [ICLR 2024 Spotlight π₯ ] - [ Best Paper Award SoCal NLP 2023 π] - Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modalβ¦β58Updated last year
- [ECCV'24 Oral] The official GitHub page for ''Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking β¦β22Updated 8 months ago
- [ECCV'24 Oral] The official GitHub page for ''Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking β¦β29Updated 8 months ago
- [CVPR2025] Official Repository for IMMUNE: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignmentβ18Updated 2 weeks ago
- A package that achieves 95%+ transfer attack success rate against GPT-4β20Updated 8 months ago
- β46Updated last year
- β44Updated 6 months ago
- ECSO (Make MLLM safe without neither training nor any external models!) (https://arxiv.org/abs/2403.09572)β25Updated 7 months ago
- [ICLR 2025] PyTorch Implementation of "ETA: Evaluating Then Aligning Safety of Vision Language Models at Inference Time"β23Updated this week
- β25Updated last month
- β48Updated 10 months ago
- [ICLR 2025] BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacksβ17Updated 2 months ago
- Repository for the Paper: Refusing Safe Prompts for Multi-modal Large Language Modelsβ17Updated 8 months ago
- An implementation for MLLM oversensitivity evaluationβ13Updated 7 months ago
- [ICML 2024] Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models.β72Updated 5 months ago
- This is an official repository of ``VLAttack: Multimodal Adversarial Attacks on Vision-Language Tasks via Pre-trained Models'' (NeurIPS 2β¦β55Updated 3 months ago
- [AAAI'25 (Oral)] Jailbreaking Large Vision-language Models via Typographic Visual Promptsβ144Updated 4 months ago
- Code for Neurips 2024 paper "Shadowcast: Stealthy Data Poisoning Attacks Against Vision-Language Models"β50Updated 5 months ago
- To Think or Not to Think: Exploring the Unthinking Vulnerability in Large Reasoning Modelsβ31Updated last month
- [NeurIPS 2024] Fight Back Against Jailbreaking via Prompt Adversarial Tuningβ10Updated 7 months ago
- Code for ACM MM2024 paper: White-box Multimodal Jailbreaks Against Large Vision-Language Modelsβ28Updated 5 months ago
- Accepted by ECCV 2024β139Updated 8 months ago
- [ECCV 2024] Official PyTorch Implementation of "How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs"β81Updated last year
- Universal Adversarial Attack, Multimodal Adversarial Attacks, VLP models, Contrastive Learning, Cross-modal Perturbation Generator, Generβ¦β17Updated 8 months ago
- [ECCV 2024] The official code for "AdaShield: Safeguarding Multimodal Large Language Models from Structure-based Attack via Adaptive Shiβ¦β59Updated 11 months ago
- CVPR 2025 - Anyattack: Towards Large-scale Self-supervised Adversarial Attacks on Vision-language Modelsβ35Updated 3 months ago
- AnyDoor: Test-Time Backdoor Attacks on Multimodal Large Language Modelsβ55Updated last year
- [COLM 2024] JailBreakV-28K: A comprehensive benchmark designed to evaluate the transferability of LLM jailbreak attacks to MLLMs, and furβ¦β67Updated last month
- Code for NeurIPS 2024 Paper "Fight Back Against Jailbreaking via Prompt Adversarial Tuning"β14Updated last month