ShenzheZhu / JailDAMLinks
[COLM 2025] JailDAM: Jailbreak Detection with Adaptive Memory for Vision-Language Model
☆18Updated last month
Alternatives and similar repositories for JailDAM
Users that are interested in JailDAM are comparing it to the libraries listed below
Sorting:
- [CVPR2024 Highlight] Official implementation for Transferable Visual Prompting. The paper "Exploring the Transferability of Visual Prompt…☆44Updated 7 months ago
- The First to Know: How Token Distributions Reveal Hidden Knowledge in Large Vision-Language Models?☆31Updated 9 months ago
- ☆18Updated last year
- [NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models☆77Updated last year
- Official PyTorch implementation of "CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive Learning" @ ICCV 2023☆36Updated last year
- ☆27Updated 3 months ago
- ☆17Updated 8 months ago
- ☆27Updated last year
- [ECCV 2024] Official PyTorch Implementation of "How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs"☆81Updated last year
- [ACL2025] Unsolvable Problem Detection: Robust Understanding Evaluation for Large Multimodal Models☆77Updated 2 months ago
- Official code base for "Long-Tailed Diffusion Models With Oriented Calibration" ICLR2024☆14Updated last year
- Code for paper "Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models."☆42Updated 9 months ago
- One Prompt Word is Enough to Boost Adversarial Robustness for Pre-trained Vision-Language Models☆50Updated 7 months ago
- [CVPR 2025] Offical implementation of the paper "Skip Tuning: Pre-trained Vision-Language Models are Effective and Efficient Adapters The…☆19Updated 5 months ago
- VHTest☆13Updated 9 months ago
- [ECCV 2024] The official code for "AdaShield: Safeguarding Multimodal Large Language Models from Structure-based Attack via Adaptive Shi…☆62Updated last year
- This is the official PyTorch Implementation of "SoTTA: Robust Test-Time Adaptation on Noisy Data Streams (NeurIPS '23)" by Taesik Gong*, …☆22Updated last year
- [ICLR 2025] Official codebase for the ICLR 2025 paper "Multimodal Situational Safety"☆21Updated last month
- ☆21Updated 9 months ago
- Official PyTorch implementation of RACRO (https://www.arxiv.org/abs/2506.04559)☆17Updated last month
- Code for Reducing Hallucinations in Vision-Language Models via Latent Space Steering☆66Updated 8 months ago
- This repository houses the code for the paper - "The Neglected of VLMs"☆28Updated 3 months ago
- The official repository for paper "MLLM-Protector: Ensuring MLLM’s Safety without Hurting Performance"☆38Updated last year
- Official Pytorch implementation of "Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations" (ICLR '25)☆79Updated 2 months ago
- ☆28Updated 4 months ago
- EMPO, A Fully Unsupervised RLVR Method☆56Updated last week
- [ICML 2024] Unsupervised Adversarial Fine-Tuning of Vision Embeddings for Robust Large Vision-Language Models☆138Updated 2 months ago
- [CVPR2025] Official Repository for IMMUNE: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment☆22Updated 2 months ago
- ☆62Updated 9 months ago
- [ICLR 2025] Code for Self-Correcting Decoding with Generative Feedback for Mitigating Hallucinations in Large Vision-Language Models☆19Updated 3 months ago