ys-zong / VLGuardLinks
[ICML 2024] Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models.
β82Updated 11 months ago
Alternatives and similar repositories for VLGuard
Users that are interested in VLGuard are comparing it to the libraries listed below
Sorting:
- ECSO (Make MLLM safe without neither training nor any external models!) (https://arxiv.org/abs/2403.09572)β36Updated last year
- [ICLR 2024 Spotlight π₯ ] - [ Best Paper Award SoCal NLP 2023 π] - Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modalβ¦β77Updated last year
- Accepted by ECCV 2024β179Updated last year
- [ECCV 2024] Official PyTorch Implementation of "How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs"β84Updated 2 years ago
- [ECCV 2024] The official code for "AdaShield: Safeguarding Multimodal Large Language Models from Structure-based Attack via Adaptive Shiβ¦β67Updated last year
- An implementation for MLLM oversensitivity evaluationβ17Updated last year
- Code for Neurips 2024 paper "Shadowcast: Stealthy Data Poisoning Attacks Against Vision-Language Models"β58Updated 11 months ago
- β66Updated 9 months ago
- The official repository for paper "MLLM-Protector: Ensuring MLLMβs Safety without Hurting Performance"β44Updated last year
- [ICLR 2025] PyTorch Implementation of "ETA: Evaluating Then Aligning Safety of Vision Language Models at Inference Time"β27Updated 5 months ago
- A toolbox for benchmarking trustworthiness of multimodal large language models (MultiTrust, NeurIPS 2024 Track Datasets and Benchmarks)β173Updated 6 months ago
- AnyDoor: Test-Time Backdoor Attacks on Multimodal Large Language Modelsβ60Updated last year
- β44Updated 6 months ago
- [ECCV'24 Oral] The official GitHub page for ''Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking β¦β36Updated last year
- [ACL 2025] Data and Code for Paper VLSBench: Unveiling Visual Leakage in Multimodal Safetyβ52Updated 5 months ago
- [NeurIPS-2023] Annual Conference on Neural Information Processing Systemsβ222Updated last year
- Official codebase for "STAIR: Improving Safety Alignment with Introspective Reasoning"β86Updated 10 months ago
- Accepted by IJCAI-24 Survey Trackβ225Updated last year
- Official repository for "Safety in Large Reasoning Models: A Survey" - Exploring safety risks, attacks, and defenses for Large Reasoning β¦β83Updated 4 months ago
- [COLM 2024] JailBreakV-28K: A comprehensive benchmark designed to evaluate the transferability of LLM jailbreak attacks to MLLMs, and furβ¦β84Updated 7 months ago
- Official codebase for Image Hijacks: Adversarial Images can Control Generative Models at Runtimeβ55Updated 2 years ago
- [CVPR 2025] Official implementation for "Steering Away from Harm: An Adaptive Approach to Defending Vision Language Model Against Jailbreβ¦β47Updated 5 months ago
- The reinforcement learning codes for dataset SPA-VLβ42Updated last year
- [CVPR2025] Official Repository for IMMUNE: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignmentβ26Updated 6 months ago
- β54Updated last year
- [AAAI'25 (Oral)] Jailbreaking Large Vision-language Models via Typographic Visual Promptsβ184Updated 6 months ago
- β55Updated last year
- A package that achieves 95%+ transfer attack success rate against GPT-4β25Updated last year
- β42Updated last year
- Awesome Large Reasoning Model(LRM) Safety.This repository is used to collect security-related research on large reasoning models such as β¦β79Updated this week