thu-ml / MLA-TrustView external linksLinks
A toolbox for benchmarking Multimodal LLM Agents trustworthiness across truthfulness, controllability, safety and privacy dimensions through 34 interactive tasks
☆63Jan 9, 2026Updated last month
Alternatives and similar repositories for MLA-Trust
Users that are interested in MLA-Trust are comparing it to the libraries listed below
Sorting:
- [ICML 2025] X-Transfer Attacks: Towards Super Transferable Adversarial Attacks on CLIP☆37Feb 3, 2026Updated last week
- On the Robustness of GUI Grounding Models Against Image Attacks☆12Apr 8, 2025Updated 10 months ago
- [NDSS'25] The official implementation of safety misalignment.☆17Jan 8, 2025Updated last year
- [CVPR2024 Highlight] Strong Transferable Adversarial Attacks via Ensembled Asymptotically Normal Distribution Learning☆19Jun 14, 2024Updated last year
- SG-Bench: Evaluating LLM Safety Generalization Across Diverse Tasks and Prompt Types☆24Nov 29, 2024Updated last year
- ☆36Feb 2, 2026Updated last week
- ☆115Jul 2, 2024Updated last year
- ☆21Mar 17, 2025Updated 10 months ago
- Enterprise AI Security Platform - Real-time firewall protection for LLM applications against prompt injection, data leakage, and function…☆23Sep 14, 2025Updated 5 months ago
- ☆34Jul 12, 2024Updated last year
- [Neurips 2025]StegoZip: Enhancing Linguistic Steganography Payload in Practice with Large Language Models☆24Dec 4, 2025Updated 2 months ago
- ☆11Dec 23, 2024Updated last year
- Official respository for ReasonGen-R1☆74Jun 23, 2025Updated 7 months ago
- ☆14Aug 7, 2025Updated 6 months ago
- ☆12May 6, 2022Updated 3 years ago
- This repo is for the safety topic, including attacks, defenses and studies related to reasoning and RL☆59Sep 5, 2025Updated 5 months ago
- Code for Findings-EMNLP 2023 paper: Multi-step Jailbreaking Privacy Attacks on ChatGPT