Xiangkui-Cao / VLBiasBenchLinks

A large-scale dataset composed of high-quality synthetic images aimed at evaluating social biases in LVLMs

☆13

Alternatives and similar repositories for VLBiasBench

Users that are interested in VLBiasBench are comparing it to the libraries listed below

Sorting:

wellzline / Trustworthy_T2I_DMs
☆13Updated 4 months ago
UCSC-VLAA / vllm-safety-benchmark
[ECCV 2024] Official PyTorch Implementation of "How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs"
☆84Updated 2 years ago
eric-ai-lab / MSSBench
[ICLR 2025] Official codebase for the ICLR 2025 paper "Multimodal Situational Safety"
☆30Updated 5 months ago
xijia-tao / ImgTrojan
Code and data for "ImgTrojan: Jailbreaking Vision-Language Models with ONE Image"
☆25Updated 8 months ago
UCSC-VLAA / STAR-1
[AAAI'26 Oral] Official Implementation of STAR-1: Safer Alignment of Reasoning LLMs with 1K Data
☆32Updated 8 months ago
nishadsinghi / CleanCLIP
Official PyTorch implementation of "CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive Learning" @ ICCV 2023
☆39Updated last month
sail-sg / Meta-Unlearning
☆33Updated 7 months ago
saferlhf-v / saferlhf-v
☆19Updated 5 months ago
ShenzheZhu / JailDAM
[COLM 2025] JailDAM: Jailbreak Detection with Adaptive Memory for Vision-Language Model
☆23Updated 2 weeks ago
wenhuang2000 / VHTest
VHTest
☆15Updated last year
thu-ml / MMTrustEval
A toolbox for benchmarking trustworthiness of multimodal large language models (MultiTrust, NeurIPS 2024 Track Datasets and Benchmarks)
☆172Updated 5 months ago
haonan3 / ICML-2024-Oral-SilentBadDiffusion
☆13Updated last year
zhaoyuzhi / ICM-Assistant
ICM-Assistant: Instruction-tuning Multimodal Large Language Models for Rule-based Explainable Image Content Moderation. AAAI, 2025
☆12Updated 3 months ago
ChenWu98 / agent-attack
[ICLR 2025] Dissecting adversarial robustness of multimodal language model agents
☆121Updated 9 months ago
alchemistyzz / PeRL
[NeurIPS'25] The official code of "PeRL: Permutation-Enhanced Reinforcement Learning for Interleaved Vision-Language Reasoning"
☆27Updated 2 months ago
thu-coai / SafeUnlearning
Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks
☆32Updated last year
LzVv123456 / VISTA
☆62Updated 4 months ago
sail-sg / MMCBench
☆27Updated last year
chiayi-hsu / Ring-A-Bell
☆38Updated 10 months ago
yihuaihong / Dissecting-FT-Unlearning
☆14Updated last year
thunxxx / MLLM-Jailbreak-evaluation-MMJ-Bench
☆65Updated 8 months ago
ExplainableML / sae-for-vlm
[NeurIPS 2025] Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models
☆46Updated last week
chs20 / RobustVLM
[ICML 2024] Unsupervised Adversarial Fine-Tuning of Vision Embeddings for Robust Large Vision-Language Models
☆148Updated 6 months ago
franciscoliu / MLLMU-Bench
[NAACL 2025 Main] Official Implementation of MLLMU-Bench
☆43Updated 8 months ago
yasamin-med / P2P
☆21Updated 4 months ago
wbopan / safety-residual-space
☆21Updated 8 months ago
umd-huang-lab / VLM-Poisoning
Code for Neurips 2024 paper "Shadowcast: Stealthy Data Poisoning Attacks Against Vision-Language Models"
☆57Updated 10 months ago
zihao-ai / unthinking_vulnerability
To Think or Not to Think: Exploring the Unthinking Vulnerability in Large Reasoning Models
☆32Updated 6 months ago
MurrayTom / SG-Bench
SG-Bench: Evaluating LLM Safety Generalization Across Diverse Tasks and Prompt Types
☆23Updated last year
TreeLLi / APT
One Prompt Word is Enough to Boost Adversarial Robustness for Pre-trained Vision-Language Models
☆56Updated 11 months ago