paul-rottger / msts-multimodal-safetyLinks

Röttger et al. (2025): "MSTS: A Multimodal Safety Test Suite for Vision-Language Models"

☆16

Alternatives and similar repositories for msts-multimodal-safety

Users that are interested in msts-multimodal-safety are comparing it to the libraries listed below

Sorting:

wicai24 / DOOR-Alignment
☆16Updated 8 months ago
Dongping-Chen / MLLM-Judge
[ICML 2024 Oral] Official code repository for MLLM-as-a-Judge.
☆86Updated 9 months ago
ml-research / LlavaGuard
☆62Updated 2 months ago
eric-ai-lab / MSSBench
[ICLR 2025] Official codebase for the ICLR 2025 paper "Multimodal Situational Safety"
☆30Updated 5 months ago
UCSC-VLAA / vllm-safety-benchmark
[ECCV 2024] Official PyTorch Implementation of "How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs"
☆84Updated 2 years ago
ys-zong / VLGuard
[ICML 2024] Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models.
☆81Updated 10 months ago
gyhdog99 / ECSO
ECSO (Make MLLM safe without neither training nor any external models!) (https://arxiv.org/abs/2403.09572)
☆34Updated last year
AI45Lab / MLLMGuard
☆42Updated 5 months ago
xirui-li / MOSSBench
An implementation for MLLM oversensitivity evaluation
☆17Updated last year
EchoseChen / SPA-VL-RLHF
The reinforcement learning codes for dataset SPA-VL
☆42Updated last year
pipilurj / MLLM-protector
The official repository for paper "MLLM-Protector: Ensuring MLLM’s Safety without Hurting Performance"
☆44Updated last year
wenhuang2000 / VHTest
VHTest
☆15Updated last year
UCSC-VLAA / STAR-1
[AAAI'26 Oral] Official Implementation of STAR-1: Safer Alignment of Reasoning LLMs with 1K Data
☆32Updated 8 months ago
shiqichen17 / VLM_Merging
Github repository for "Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging" (ICML 2025)
☆86Updated 2 months ago
Yuqifan1117 / HalluciDoctor
HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data (Accepted by CVPR 2024)
☆50Updated last year
luka-group / vlm-knowledge-conflict
Code for paper "Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models."
☆49Updated last year
Qinyu-Allen-Zhao / LVLM-LP
The First to Know: How Token Distributions Reveal Hidden Knowledge in Large Vision-Language Models?
☆41Updated last year
nishadsinghi / CleanCLIP
Official PyTorch implementation of "CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive Learning" @ ICCV 2023
☆39Updated last month
yfzhang114 / LLaVA-Align
[ACM Multimedia 2025] This is the official repo for Debiasing Large Visual Language Models, including a Post-Hoc debias method and Visual…
☆82Updated 9 months ago
wuxiyang1996 / AutoHallusion
AutoHallusion Codebase (EMNLP 2024)
☆21Updated last year
yihedeng9 / STIC
Enhancing Large Vision Language Models with Self-Training on Image Comprehension.
☆70Updated last year
ExplainableML / sae-for-vlm
[NeurIPS 2025] Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models
☆46Updated last week
DripNowhy / ETA
[ICLR 2025] PyTorch Implementation of "ETA: Evaluating Then Aligning Safety of Vision Language Models at Inference Time"
☆27Updated 4 months ago
chs20 / RobustVLM
[ICML 2024] Unsupervised Adversarial Fine-Tuning of Vision Embeddings for Robust Large Vision-Language Models
☆148Updated 6 months ago
thu-ml / STAIR
Official codebase for "STAIR: Improving Safety Alignment with Introspective Reasoning"
☆87Updated 9 months ago
vlm2-bench / VLM2-Bench
VLM2-Bench [ACL 2025 Main]: A Closer Look at How Well VLMs Implicitly Link Explicit Matching Visual Cues
☆42Updated 6 months ago
clemneo / llava-interp
☆76Updated last year
shikiw / Modality-Integration-Rate
[ICCV 2025] The official code of the paper "Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration R…
☆107Updated 5 months ago
SaFoLab-WISC / AdaShield
[ECCV 2024] The official code for "AdaShield: Safeguarding Multimodal Large Language Models from Structure-based Attack via Adaptive Shi…
☆67Updated last year
shengliu66 / VTI
Code for Reducing Hallucinations in Vision-Language Models via Latent Space Steering
☆92Updated last year