AI-secure/DecodingTrust

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/AI-secure/DecodingTrust)

AI-secure / DecodingTrust

A Comprehensive Assessment of Trustworthiness in GPT Models

☆314

Alternatives and similar repositories for DecodingTrust

Users that are interested in DecodingTrust are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

AI-secure / MMDT
View on GitHub
Comprehensive Assessment of Trustworthiness in Multimodal Foundation Models
☆29Mar 15, 2025Updated last year
lancopku / SOS
View on GitHub
Code for the paper "Rethinking Stealthiness of Backdoor Attack against NLP Models" (ACL-IJCNLP 2021)
☆24Dec 9, 2021Updated 4 years ago
decoding-comp-trust / comp-trust
View on GitHub
Codebase for decoding compressed trust.
☆27May 7, 2024Updated 2 years ago
centerforaisafety / HarmBench
View on GitHub
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
☆1,011Aug 16, 2024Updated last year
centerforaisafety / tdc2023-starter-kit
View on GitHub
This is the starter kit for the Trojan Detection Challenge 2023 (LLM Edition), a NeurIPS 2023 competition.
☆92May 19, 2024Updated 2 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
LLM-Tuning-Safety / LLMs-Finetuning-Safety
View on GitHub
We jailbreak GPT-3.5 Turbo’s safety guardrails by fine-tuning it on only 10 adversarially designed examples, at a cost of less than $0.20…
☆358Feb 23, 2024Updated 2 years ago
Yu-Fangxu / COLD-Attack
View on GitHub
[ICML 2024] COLD-Attack: Jailbreaking LLMs with Stealthiness and Controllability
☆176Dec 18, 2024Updated last year
RICommunity / TAP
View on GitHub
TAP: An automated jailbreaking method for black-box LLMs
☆241Dec 10, 2024Updated last year
patrickrchao / JailbreakingLLMs
View on GitHub
☆756Jul 2, 2025Updated last year
JailbreakBench / jailbreakbench
View on GitHub
JailbreakBench: An Open Robustness Benchmark for Jailbreaking Language Models [NeurIPS 2024 Datasets and Benchmarks Track]
☆632Apr 4, 2025Updated last year
HowieHwong / TrustLLM
View on GitHub
[ICML 2024] TrustLLM: Trustworthiness in Large Language Models
☆629Jun 24, 2025Updated last year
Unispac / Visual-Adversarial-Examples-Jailbreak-Large-Language-Models
View on GitHub
Repository for the Paper (AAAI 2024, Oral) --- Visual Adversarial Examples Jailbreak Large Language Models
☆281May 13, 2024Updated 2 years ago
CryptoAILab / Awesome-LM-SSP
View on GitHub
A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).
☆2,019Jun 17, 2026Updated last month
Princeton-SysML / Jailbreak_LLM
View on GitHub
☆203Nov 26, 2023Updated 2 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
YihanWang617 / llm-jailbreaking-defense
View on GitHub
A lightweight library for large laguage model (LLM) jailbreaking defense.
☆61Sep 11, 2025Updated 10 months ago
lapisrocks / rpo
View on GitHub
Official repository for "Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks"
☆62Aug 8, 2024Updated last year
stanford-crfm / air-bench-2024
View on GitHub
AIR-Bench 2024 is a safety benchmark that aligns with emerging government regulations and company policies
☆30Aug 14, 2024Updated last year
llm-attacks / llm-attacks
View on GitHub
Universal and Transferable Attacks on Aligned Language Models
☆4,741Aug 2, 2024Updated last year
CryptoAILab / MergeGuard
View on GitHub
[CCS-LAMPS'24] LLM IP Protection Against Model Merging
☆16Oct 14, 2024Updated last year
chawins / llm-sp
View on GitHub
Papers and resources related to the security and privacy of LLMs 🤖
☆579Jun 8, 2025Updated last year
CHATS-lab / persuasive_jailbreaker
View on GitHub
Persuasive Jailbreaker: we can persuade LLMs to jailbreak them!
☆363Oct 17, 2025Updated 9 months ago
AI-secure / RedCode
View on GitHub
[NeurIPS'24] RedCode: Risky Code Execution and Generation Benchmark for Code Agents
☆85Apr 24, 2026Updated 2 months ago
VITA-Group / Shake-to-Leak
View on GitHub
[SatML 2024] Shake to Leak: Fine-tuning Diffusion Models Can Amplify the Generative Privacy Risk
☆16Mar 15, 2025Updated last year
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
thu-coai / SafetyBench
View on GitHub
Official github repo for SafetyBench, a comprehensive benchmark to evaluate LLMs' safety. [ACL 2024]
☆296Jul 28, 2025Updated 11 months ago
RU-System-Software-and-Security / NONE
View on GitHub
☆10Oct 31, 2022Updated 3 years ago
SCLBD / Effective_backdoor_defense
View on GitHub
☆14Oct 7, 2022Updated 3 years ago
swj0419 / detect-pretrain-code
View on GitHub
This repository provides an original implementation of Detecting Pretraining Data from Large Language Models by *Weijia Shi, *Anirudh Aji…
☆243Nov 3, 2023Updated 2 years ago
kztakemoto / simbaja
View on GitHub
All in How You Ask for It: Simple Black-Box Method for Jailbreak Attacks
☆17Apr 24, 2024Updated 2 years ago
stanford-crfm / helm
View on GitHub
Holistic Evaluation of Language Models (HELM) is an open source Python framework created by the Center for Research on Foundation Models …
☆2,857Jul 1, 2026Updated 2 weeks ago
thu-ml / Attack-Bard
View on GitHub
☆108Feb 16, 2024Updated 2 years ago
thestephencasper / explore_establish_exploit_llms
View on GitHub
☆31Jul 14, 2023Updated 3 years ago
jinghuichen / AWM
View on GitHub
Github repo for One-shot Neural Backdoor Erasing via Adversarial Weight Masking (NeurIPS 2022)
☆15Jan 3, 2023Updated 3 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
yfchen1994 / poisoning_membership
View on GitHub
☆20Oct 28, 2025Updated 8 months ago
bboylyg / RNP
View on GitHub
Reconstructive Neuron Pruning for Backdoor Defense (ICML 2023)
☆40Dec 24, 2023Updated 2 years ago
usail-hkust / JailTrickBench
View on GitHub
Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs. Empirical tricks for LLM Jailbreaking. (NeurIPS 2024)
☆166Nov 30, 2024Updated last year
Aatrox103 / SAP
View on GitHub
☆49May 9, 2024Updated 2 years ago
EasyJailbreak / EasyJailbreak
View on GitHub
An easy-to-use Python framework to generate adversarial jailbreak prompts.
☆873Mar 30, 2026Updated 3 months ago
csdongxian / ANP_backdoor
View on GitHub
Codes for NeurIPS 2021 paper "Adversarial Neuron Pruning Purifies Backdoored Deep Models"
☆65May 8, 2023Updated 3 years ago
Improbable-AI / curiosity_redteam
View on GitHub
Official implementation of ICLR'24 paper, "Curiosity-driven Red Teaming for Large Language Models" (https://openreview.net/pdf?id=4KqkizX…
☆90Mar 15, 2024Updated 2 years ago