ASTRAL-Group / ASTRALinks
[CVPR 2025] Official implementation for "Steering Away from Harm: An Adaptive Approach to Defending Vision Language Model Against Jailbreaks"
β20Updated last week
Alternatives and similar repositories for ASTRA
Users that are interested in ASTRA are comparing it to the libraries listed below
Sorting:
- [ICLR 2024 Spotlight π₯ ] - [ Best Paper Award SoCal NLP 2023 π] - Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modalβ¦β56Updated last year
- β46Updated 2 months ago
- A package that achieves 95%+ transfer attack success rate against GPT-4β20Updated 7 months ago
- β43Updated 6 months ago
- β42Updated last year
- [ECCV'24 Oral] The official GitHub page for ''Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking β¦β28Updated 7 months ago
- [CVPR2025] Official Repository for IMMUNE: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignmentβ17Updated this week
- [ICLR 2025] PyTorch Implementation of "ETA: Evaluating Then Aligning Safety of Vision Language Models at Inference Time"β21Updated 2 weeks ago
- ECSO (Make MLLM safe without neither training nor any external models!) (https://arxiv.org/abs/2403.09572)β23Updated 7 months ago
- [ECCV'24 Oral] The official GitHub page for ''Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking β¦β19Updated 7 months ago
- Code for Neurips 2024 paper "Shadowcast: Stealthy Data Poisoning Attacks Against Vision-Language Models"β49Updated 4 months ago
- β47Updated 9 months ago
- [ICML 2024] Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models.β71Updated 4 months ago
- [COLM 2024] JailBreakV-28K: A comprehensive benchmark designed to evaluate the transferability of LLM jailbreak attacks to MLLMs, and furβ¦β63Updated 3 weeks ago
- β26Updated 2 weeks ago
- [ICLR 2024] Inducing High Energy-Latency of Large Vision-Language Models with Verbose Imagesβ35Updated last year
- An implementation for MLLM oversensitivity evaluationβ13Updated 6 months ago
- Accepted by ECCV 2024β130Updated 7 months ago
- [AAAI'25 (Oral)] Jailbreaking Large Vision-language Models via Typographic Visual Promptsβ141Updated 3 months ago
- [ICLR 2025] BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacksβ17Updated last month
- AnyDoor: Test-Time Backdoor Attacks on Multimodal Large Language Modelsβ54Updated last year
- β22Updated 9 months ago
- To Think or Not to Think: Exploring the Unthinking Vulnerability in Large Reasoning Modelsβ30Updated 2 weeks ago
- Universal Adversarial Attack, Multimodal Adversarial Attacks, VLP models, Contrastive Learning, Cross-modal Perturbation Generator, Generβ¦β17Updated 7 months ago
- Code for ICLR 2025 Failures to Find Transferable Image Jailbreaks Between Vision-Language Modelsβ27Updated this week
- [ECCV 2024] Official PyTorch Implementation of "How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs"β80Updated last year
- Code for NeurIPS 2024 Paper "Fight Back Against Jailbreaking via Prompt Adversarial Tuning"β12Updated last month
- [ECCV 2024] The official code for "AdaShield: Safeguarding Multimodal Large Language Models from Structure-based Attack via Adaptive Shiβ¦β58Updated 10 months ago
- β98Updated last year
- β34Updated 10 months ago