centerforaisafety/tdc2023-starter-kit

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/centerforaisafety/tdc2023-starter-kit)

centerforaisafety / tdc2023-starter-kit

This is the starter kit for the Trojan Detection Challenge 2023 (LLM Edition), a NeurIPS 2023 competition.

☆92

Alternatives and similar repositories for tdc2023-starter-kit

Users that are interested in tdc2023-starter-kit are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

mmazeika / tdc-starter-kit
View on GitHub
Starter kit and data loading code for the Trojan Detection Challenge NeurIPS 2022 competition
☆32Jul 26, 2023Updated 2 years ago
RU-System-Software-and-Security / NONE
View on GitHub
☆10Oct 31, 2022Updated 3 years ago
chawins / pal
View on GitHub
PAL: Proxy-Guided Black-Box Attack on Large Language Models
☆57Aug 17, 2024Updated last year
AI-secure / Knowledge-Enhanced-Machine-Learning-Pipeline
View on GitHub
Repository for Knowledge Enhanced Machine Learning Pipeline (KEMLP)
☆10Jun 5, 2021Updated 5 years ago
ethz-spylab / rlhf_trojan_competition
View on GitHub
Finding trojans in aligned LLMs. Official repository for the competition hosted at SaTML 2024.
☆119Jun 13, 2024Updated 2 years ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
sail-sg / I-FSJ
View on GitHub
Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses (NeurIPS 2024)
☆65Jan 11, 2025Updated last year
LLM-Tuning-Safety / LLMs-Finetuning-Safety
View on GitHub
We jailbreak GPT-3.5 Turbo’s safety guardrails by fine-tuning it on only 10 adversarially designed examples, at a cost of less than $0.20…
☆358Feb 23, 2024Updated 2 years ago
real-absolute-AI / Unnatural_Language
View on GitHub
The official repository of 'Unnatural Language Are Not Bugs but Features for LLMs'
☆24May 20, 2025Updated last year
reds-lab / Meta-Sift
View on GitHub
The official implementation of USENIX Security'23 paper "Meta-Sift" -- Ten minutes or less to find a 1000-size or larger clean subset on …
☆20Apr 27, 2023Updated 3 years ago
wonderNefelibata / Awesome-LRM-Safety
View on GitHub
Awesome Large Reasoning Model(LRM) Safety.This repository is used to collect security-related research on large reasoning models such as …
☆84Updated this week
kztakemoto / simbaja
View on GitHub
All in How You Ask for It: Simple Black-Box Method for Jailbreak Attacks
☆17Apr 24, 2024Updated 2 years ago
sail-sg / Cheating-LLM-Benchmarks
View on GitHub
[ICLR 2025] Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates (Oral)
☆86Oct 23, 2024Updated last year
Unispac / Fight-Poison-With-Poison
View on GitHub
Code repository for the paper --- [USENIX Security 2023] Towards A Proactive ML Approach for Detecting Backdoor Poison Samples
☆31Jul 11, 2023Updated 3 years ago
thestephencasper / explore_establish_exploit_llms
View on GitHub
☆31Jul 14, 2023Updated 3 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
Sandy-Zeng / NPAttack
View on GitHub
Pytorch implementation of NPAttack
☆12Jul 7, 2020Updated 6 years ago
PurduePAML / PICCOLO
View on GitHub
☆26Dec 1, 2022Updated 3 years ago
weichen-yu / LM-Extraction
View on GitHub
☆43May 23, 2023Updated 3 years ago
ejones313 / auditing-llms
View on GitHub
☆61Mar 9, 2023Updated 3 years ago
ydc123 / MMP-Attack
View on GitHub
Official repository for "On the Multi-modal Vulnerability of Diffusion Models"
☆17Jul 15, 2024Updated 2 years ago
reds-lab / BEEAR
View on GitHub
This is the official Gtihub repo for our paper: "BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction-tuned Lang…
☆23Jul 3, 2024Updated 2 years ago
dongyp13 / memorization-AT
View on GitHub
☆20Mar 14, 2022Updated 4 years ago
VITA-Group / Random-Shuffling-BackdoorDetect
View on GitHub
[NeurIPS 2022] "Randomized Channel Shuffling: Minimal-Overhead Backdoor Attack Detection without Clean Datasets" by Ruisi Cai*, Zhenyu Zh…
☆21Oct 1, 2022Updated 3 years ago
aengusl / latent-adversarial-training
View on GitHub
☆48Sep 29, 2024Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
Gwinhen / MOTH
View on GitHub
This is the implementation for IEEE S&P 2022 paper "Model Orthogonalization: Class Distance Hardening in Neural Networks for Better Secur…
☆11Aug 24, 2022Updated 3 years ago
tml-epfl / llm-adaptive-attacks
View on GitHub
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks [ICLR 2025]
☆390Jan 23, 2025Updated last year
Unispac / Visual-Adversarial-Examples-Jailbreak-Large-Language-Models
View on GitHub
Repository for the Paper (AAAI 2024, Oral) --- Visual Adversarial Examples Jailbreak Large Language Models
☆281May 13, 2024Updated 2 years ago
VITA-Group / Backdoor-LTH
View on GitHub
[CVPR 2022] "Quarantine: Sparsity Can Uncover the Trojan Attack Trigger for Free" by Tianlong Chen*, Zhenyu Zhang*, Yihua Zhang*, Shiyu C…
☆27Oct 5, 2022Updated 3 years ago
GraySwanAI / circuit-breakers
View on GitHub
Improving Alignment and Robustness with Circuit Breakers
☆266Sep 24, 2024Updated last year
inspire-group / DP-RandP
View on GitHub
[NeurIPS 2023] Differentially Private Image Classification by Learning Priors from Random Processes
☆12Jun 12, 2023Updated 3 years ago
yunqing-me / AttackVLM
View on GitHub
[NeurIPS-2023] Annual Conference on Neural Information Processing Systems
☆231Dec 22, 2024Updated last year
locuslab / breaking-poisoned-classifier
View on GitHub
Code for paper "Poisoned classifiers are not only backdoored, they are fundamentally broken"
☆26Jan 7, 2022Updated 4 years ago
fra31 / rlhf-trojan-competition-submission
View on GitHub
☆19Feb 25, 2024Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
llm-attacks / llm-attacks
View on GitHub
Universal and Transferable Attacks on Aligned Language Models
☆4,741Aug 2, 2024Updated last year
centerforaisafety / HarmBench
View on GitHub
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
☆1,011Aug 16, 2024Updated last year
PurduePAML / K-ARM_Backdoor_Optimization
View on GitHub
☆18Jun 15, 2021Updated 5 years ago
xpf / Data-Efficient-Backdoor-Attacks
View on GitHub
Data-Efficient Backdoor Attacks
☆20Jun 15, 2022Updated 4 years ago
andyzoujm / breaking-llama-guard
View on GitHub
Code to break Llama Guard
☆32Dec 7, 2023Updated 2 years ago
Ekko-zn / IJCAI2022-Backdoor
View on GitHub
☆20May 6, 2022Updated 4 years ago
zhxieml / remiss-jailbreak
View on GitHub
☆33Jun 24, 2024Updated 2 years ago