ShenzheZhu / JailDAMLinks

[COLM 2025] JailDAM: Jailbreak Detection with Adaptive Memory for Vision-Language Model

☆23

Alternatives and similar repositories for JailDAM

Users that are interested in JailDAM are comparing it to the libraries listed below

Sorting:

eric-ai-lab / MSSBench
[ICLR 2025] Official codebase for the ICLR 2025 paper "Multimodal Situational Safety"
☆30Updated 5 months ago
SaFoLab-WISC / FIUBench
A Task of Fictitious Unlearning for VLMs
☆24Updated 7 months ago
kigb / DropoutDecoding
[NeurIPS 2025] Official Implementation for "Enhancing Vision-Language Model Reliability with Uncertainty-Guided Dropout Decoding"
☆22Updated 11 months ago
wicai24 / DOOR-Alignment
☆16Updated 7 months ago
SaFoLab-WISC / AdaShield
[ECCV 2024] The official code for "AdaShield: Safeguarding Multimodal Large Language Models from Structure-based Attack via Adaptive Shi…
☆68Updated last year
UCSC-VLAA / vllm-safety-benchmark
[ECCV 2024] Official PyTorch Implementation of "How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs"
☆84Updated 2 years ago
ExplainableML / sae-for-vlm
[NeurIPS 2025] Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models
☆43Updated 7 months ago
luka-group / vlm-knowledge-conflict
Code for paper "Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models."
☆48Updated last year
clemneo / llava-interp
☆73Updated last year
nishadsinghi / CleanCLIP
Official PyTorch implementation of "CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive Learning" @ ICCV 2023
☆39Updated last month
Qinyu-Allen-Zhao / LVLM-LP
The First to Know: How Token Distributions Reveal Hidden Knowledge in Large Vision-Language Models?
☆41Updated last year
keven980716 / weak-to-strong-deception
[ICLR 2025] Code&Data for the paper "Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization"
☆13Updated last year
sail-sg / Meta-Unlearning
☆33Updated 7 months ago
YiyangZhou / CSR
[NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models
☆80Updated last month
gyhdog99 / ECSO
ECSO (Make MLLM safe without neither training nor any external models!) (https://arxiv.org/abs/2403.09572)
☆34Updated last year
MajorDavidZhang / MCL
code for Learning the Unlearned: Mitigating Feature Suppression in Contrastive Learning
☆18Updated last year
zycheiheihei / Transferable-Visual-Prompting
[CVPR2024 Highlight] Official implementation for Transferable Visual Prompting. The paper "Exploring the Transferability of Visual Prompt…
☆46Updated 11 months ago
s-vco / s-vco
Symmetrical Visual Contrastive Optimization: Aligning Vision-Language Models with Minimal Contrastive Images
☆16Updated 5 months ago
ZhentingWang / DUMP
☆32Updated 6 months ago
ChnQ / TracingLLM
☆30Updated last year
itsvaibhav01 / Immune
[CVPR2025] Official Repository for IMMUNE: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment
☆25Updated 5 months ago
Dongping-Chen / MLLM-Judge
[ICML 2024 Oral] Official code repository for MLLM-as-a-Judge.
☆86Updated 9 months ago
haonan3 / V1
V1: Toward Multimodal Reasoning by Designing Auxiliary Task
☆36Updated 7 months ago
nickjiang2378 / vlm-hallucinations
[ICLR '25] Official Pytorch implementation of "Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations"
☆92Updated 6 months ago
UCSC-VLAA / STAR-1
[AAAI'26 Oral] Official Implementation of STAR-1: Safer Alignment of Reasoning LLMs with 1K Data
☆32Updated 7 months ago
alchemistyzz / PeRL
[NeurIPS'25] The official code of "PeRL: Permutation-Enhanced Reinforcement Learning for Interleaved Vision-Language Reasoning"
☆25Updated 2 months ago
jiaangli / VLCA
Do Vision and Language Models Share Concepts? A Vector Space Alignment Study
☆16Updated last year
gyhdog99 / RACRO2
Official PyTorch implementation of RACRO (https://www.arxiv.org/abs/2506.04559)
☆19Updated 4 months ago
OpenCausaLab / CELLO
☆21Updated last year
pipilurj / MLLM-protector
The official repository for paper "MLLM-Protector: Ensuring MLLM’s Safety without Hurting Performance"
☆44Updated last year