ASTRAL-Group / ASTRALinks
[CVPR 2025] Official implementation for "Steering Away from Harm: An Adaptive Approach to Defending Vision Language Model Against Jailbreaks"
β33Updated last week
Alternatives and similar repositories for ASTRA
Users that are interested in ASTRA are comparing it to the libraries listed below
Sorting:
- [ICLR 2024 Spotlight π₯ ] - [ Best Paper Award SoCal NLP 2023 π] - Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modalβ¦β61Updated last year
- β52Updated 3 months ago
- β47Updated last year
- [ECCV'24 Oral] The official GitHub page for ''Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking β¦β25Updated 8 months ago
- Accepted by ECCV 2024β142Updated 9 months ago
- β44Updated 7 months ago
- β48Updated 11 months ago
- [AAAI'25 (Oral)] Jailbreaking Large Vision-language Models via Typographic Visual Promptsβ156Updated 3 weeks ago
- Code for Neurips 2024 paper "Shadowcast: Stealthy Data Poisoning Attacks Against Vision-Language Models"β50Updated 6 months ago
- [CVPR2025] Official Repository for IMMUNE: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignmentβ19Updated last month
- [ECCV'24 Oral] The official GitHub page for ''Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking β¦β29Updated 9 months ago
- β30Updated 2 months ago
- [NeurIPS 2024] Fight Back Against Jailbreaking via Prompt Adversarial Tuningβ10Updated 8 months ago
- This is an official repository of ``VLAttack: Multimodal Adversarial Attacks on Vision-Language Tasks via Pre-trained Models'' (NeurIPS 2β¦β55Updated 3 months ago
- AnyDoor: Test-Time Backdoor Attacks on Multimodal Large Language Modelsβ55Updated last year
- A package that achieves 95%+ transfer attack success rate against GPT-4β23Updated 8 months ago
- [ICLR 2025] BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacksβ19Updated 3 months ago
- [ICLR 2024] Inducing High Energy-Latency of Large Vision-Language Models with Verbose Imagesβ36Updated last year
- [ICML 2024] Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models.β74Updated 5 months ago
- β102Updated last year
- Code for NeurIPS 2024 Paper "Fight Back Against Jailbreaking via Prompt Adversarial Tuning"β14Updated 2 months ago
- A toolbox for benchmarking trustworthiness of multimodal large language models (MultiTrust, NeurIPS 2024 Track Datasets and Benchmarks)β154Updated 2 weeks ago
- Code for ACM MM2024 paper: White-box Multimodal Jailbreaks Against Large Vision-Language Modelsβ29Updated 6 months ago
- To Think or Not to Think: Exploring the Unthinking Vulnerability in Large Reasoning Modelsβ31Updated last month
- An implementation for MLLM oversensitivity evaluationβ13Updated 8 months ago
- ECSO (Make MLLM safe without neither training nor any external models!) (https://arxiv.org/abs/2403.09572)β28Updated 8 months ago
- [ECCV 2024] The official code for "AdaShield: Safeguarding Multimodal Large Language Models from Structure-based Attack via Adaptive Shiβ¦β60Updated last year
- [NeurIPS-2023] Annual Conference on Neural Information Processing Systemsβ205Updated 6 months ago
- [ICLR 2025] PyTorch Implementation of "ETA: Evaluating Then Aligning Safety of Vision Language Models at Inference Time"β24Updated 3 weeks ago
- [ACL 2025] Data and Code for Paper VLSBench: Unveiling Visual Leakage in Multimodal Safetyβ48Updated 2 months ago