UCSB-AI/MSSBench

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/UCSB-AI/MSSBench)

UCSB-AI / MSSBench

[ICLR 2025] Official codebase for the ICLR 2025 paper "Multimodal Situational Safety"

☆36

Alternatives and similar repositories for MSSBench

Users that are interested in MSSBench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

xirui-li / MOSSBench
View on GitHub
An implementation for MLLM oversensitivity evaluation
☆18Nov 16, 2024Updated last year
EchoSafe-MLLM / EchoSafe
View on GitHub
[CVPR 2026] Code for Evolving Contextual Safety in Multi-Modal Large Language Models via Inference-Time Self-Reflective Memory
☆15Mar 18, 2026Updated 4 months ago
saferlhf-v / saferlhf-v
View on GitHub
☆23Jun 16, 2025Updated last year
sinwang20 / SIUO
View on GitHub
[NAACL 2025] SIUO: Cross-Modality Safety Alignment
☆125Jan 31, 2025Updated last year
wicai24 / DOOR-Alignment
View on GitHub
☆20Apr 7, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
UCSB-AI / SafeKey
View on GitHub
[EMNLP 2025] Official code for the paper "SafeKey: Amplifying Aha-Moment Insights for Safety Reasoning"
☆16May 12, 2026Updated 2 months ago
isXinLiu / MM-SafetyBench
View on GitHub
Accepted by ECCV 2024
☆218Oct 15, 2024Updated last year
itsvaibhav01 / Immune
View on GitHub
[CVPR2025] Official Repository for IMMUNE: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment
☆28Jun 11, 2025Updated last year
UCSB-AI / FedVLN
View on GitHub
[ECCV 2022] Official pytorch implementation of the paper "FedVLN: Privacy-preserving Federated Vision-and-Language Navigation"
☆14Oct 8, 2022Updated 3 years ago
CryptoAILab / FigStep
View on GitHub
[AAAI'25 (Oral)] Jailbreaking Large Vision-language Models via Typographic Visual Prompts
☆212Jun 26, 2025Updated last year
AI45Lab / VLSBench
View on GitHub
[ACL 2025] Data and Code for Paper VLSBench: Unveiling Visual Leakage in Multimodal Safety
☆62Jul 21, 2025Updated 11 months ago
ShenzheZhu / JailDAM
View on GitHub
[COLM 2025] JailDAM: Jailbreak Detection with Adaptive Memory for Vision-Language Model
☆26Nov 25, 2025Updated 7 months ago
UCSB-AI / MMWorld
View on GitHub
Official repo of the ICLR 2025 paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"
☆28Jul 15, 2025Updated last year
AI-secure / MMDT
View on GitHub
Comprehensive Assessment of Trustworthiness in Multimodal Foundation Models
☆29Mar 15, 2025Updated last year
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
MLRM-Halu / MLRM-Halu
View on GitHub
[NeurIPS 2025] More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models
☆82May 31, 2025Updated last year
thu-coai / Agent-SafetyBench
View on GitHub
☆149Aug 11, 2025Updated 11 months ago
franciscoliu / SKU
View on GitHub
Official code implementation of SKU, Accepted by ACL 2024 Findings
☆20Dec 18, 2024Updated last year
thu-ml / MMTrustEval
View on GitHub
A toolbox for benchmarking trustworthiness of multimodal large language models (MultiTrust, NeurIPS 2024 Track Datasets and Benchmarks)
☆177Jun 27, 2025Updated last year
thu-ml / STAIR
View on GitHub
Official codebase for "STAIR: Improving Safety Alignment with Introspective Reasoning"
☆89Feb 26, 2025Updated last year
chuhac / Reasoning-to-Defend
View on GitHub
[EMNLP 2025] Reasoning-to-Defend: Safety-Aware Reasoning Can Defend Large Language Models from Jailbreaking
☆12Aug 22, 2025Updated 10 months ago
SaFo-Lab / AdaShield
View on GitHub
[ECCV 2024] The official code for "AdaShield: Safeguarding Multimodal Large Language Models from Structure-based Attack via Adaptive Shi…
☆73Feb 9, 2026Updated 5 months ago
shiningrain / JailGuard
View on GitHub
☆32Mar 16, 2025Updated last year
MingyuJ666 / SEAttnGAN
View on GitHub
[ICONIP'24]Mingyu.Jin's final year project
☆30Aug 23, 2024Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
youngwanLEE / holisafe
View on GitHub
[CVPR Findings 2026] HoliSafe: Holistic Safety Benchmarking and Modeling for Vision-Language Model
☆17Mar 8, 2026Updated 4 months ago
DripNowhy / ETA
View on GitHub
[ICLR 2025] PyTorch Implementation of "ETA: Evaluating Then Aligning Safety of Vision Language Models at Inference Time"
☆34Jul 20, 2025Updated last year
shengyin1224 / SafeAgentBench
View on GitHub
Codes for paper "SafeAgentBench: A Benchmark for Safe Task Planning of \\ Embodied LLM Agents"
☆74Feb 25, 2025Updated last year
NY1024 / SafeBench
View on GitHub
☆22Oct 25, 2024Updated last year
PandragonXIII / CIDER
View on GitHub
This is the official repository for Cross-modality Information Check for Detecting Jailbreaking in Multimodal Large Language Models.
☆15Jan 16, 2025Updated last year
zhaoshiji123 / SI-Attack
View on GitHub
Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency
☆16Aug 6, 2025Updated 11 months ago
google-research / preprocessor-aware-black-box-attack
View on GitHub
☆21Mar 19, 2023Updated 3 years ago
OpenKG-ORG / EasyDetect
View on GitHub
An Easy-to-use Hallucination Detection Framework for LLMs.
☆64Apr 21, 2024Updated 2 years ago
yueliu1999 / GuardReasoner-VL
View on GitHub
[NeurIPS 2025] An official source code for paper "GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning".
☆123Feb 22, 2026Updated 4 months ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
AI45Lab / MLLMGuard
View on GitHub
☆46Jun 19, 2025Updated last year
KID-22 / LLM-Unlearning-Paper-List
View on GitHub
☆28Dec 18, 2025Updated 7 months ago
EchoseChen / SPA-VL-RLHF
View on GitHub
The reinforcement learning codes for dataset SPA-VL
☆48Jun 24, 2024Updated 2 years ago
thestephencasper / latent_adversarial_training
View on GitHub
☆24Jul 25, 2024Updated last year
NJU-LINK / DRIFT
View on GitHub
Design for Error Detection in Deep-Research Agents Trajectories.
☆22Jun 4, 2026Updated last month
UCSB-AI / HarnessAudit
View on GitHub
Official codebase for the paper "Auditing Agent Harness Safety"
☆51May 19, 2026Updated 2 months ago
junkangwu / Dr_DPO
View on GitHub
[ICLR 2025] Official code of "Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization"
☆19Jun 1, 2024Updated 2 years ago