Django-Jiang/BadChain

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Django-Jiang/BadChain)

Django-Jiang / BadChain

[ICLR24] Official Repo of BadChain: Backdoor Chain-of-Thought Prompting for Large Language Models

☆56

Alternatives and similar repositories for BadChain

Users that are interested in BadChain are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Lucas-TY / llm_Implicit_reference
View on GitHub
Official Implementation of implicit reference attack
☆11Oct 16, 2024Updated last year
vinusankars / BEAST
View on GitHub
Implementation of BEAST adversarial attack for language models (ICML 2024)
☆89May 14, 2024Updated 2 years ago
zhmzm / FLDetector_pytorch
View on GitHub
☆17Feb 7, 2024Updated 2 years ago
ventr1c / RES-GCL
View on GitHub
An official PyTorch implementation of "Certifiably Robust Graph Contrastive Learning" (NeurIPS 2023)
☆11Jan 22, 2024Updated 2 years ago
SolidShen / BAIT
View on GitHub
🔥🔥🔥 Detecting hidden backdoors in Large Language Models with only black-box access
☆57Jun 2, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
AfoninAndrei / ICLR2022
View on GitHub
Towards model-agnostic federated learning using knowledge distillation, ICLR 2022
☆10Mar 12, 2022Updated 4 years ago
clearloveclearlove / BEAT
View on GitHub
☆15Feb 26, 2025Updated last year
lancopku / agent-backdoor-attacks
View on GitHub
Code&Data for the paper "Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents" [NeurIPS 2024]
☆115Sep 27, 2024Updated last year
Bowen1911 / xJailbreak
View on GitHub
Code of paper: xJailbreak: Representation Space Guided Reinforcement Learning for Interpretable LLM Jailbreaking"
☆17Apr 3, 2026Updated 3 months ago
Vincent-HKUSTGZ / PEFTGuard
View on GitHub
Official repository for PEFTGuard: Detecting Backdoor Attacks Against Parameter-Efficient Fine-Tuning, accepted at 2025 IEEE Symposium on…
☆18Jul 4, 2025Updated last year
jumxglhf / AKGNN
View on GitHub
The source code for Adaptive Kernel Graph Neural Network at AAAI2022
☆14Feb 23, 2022Updated 4 years ago
uw-nsl / CleanGen
View on GitHub
[EMNLP 24] Official Implementation of CLEANGEN: Mitigating Backdoor Attacks for Generation Tasks in Large Language Models
☆19Mar 9, 2025Updated last year
UNHSAILLab / working-memory-attack-on-llms
View on GitHub
Working Memory Attack on LLMs
☆18May 27, 2025Updated last year
dukeceicenter / jailbreak-reasoning-openai-o1o3-deepseek-r1
View on GitHub
☆121Apr 27, 2025Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
aiPenguin / StopReasoning
View on GitHub
☆15Oct 6, 2024Updated last year
YancyKahn / CoA
View on GitHub
Chain of Attack: a Semantic-Driven Contextual Multi-Turn attacker for LLM
☆39Jan 17, 2025Updated last year
Lyz1213 / BadEdit
View on GitHub
☆38Oct 17, 2024Updated last year
Kwai-Klear / CE-GPPO
View on GitHub
CE-GPPO: Controlling Entropy via Gradient-Preserving Clipping Policy Optimization in Reinforcement Learning
☆16Jan 23, 2026Updated 5 months ago
ShoumikSaha / agent-skill-security
View on GitHub
☆15May 13, 2026Updated 2 months ago
AI-secure / AgentPoison
View on GitHub
[NeurIPS 2024] Official implementation for "AgentPoison: Red-teaming LLM Agents via Memory or Knowledge Base Backdoor Poisoning"
☆230Jun 17, 2026Updated last month
CSUN1997 / Non-IID-federated-learning
View on GitHub
Reproduce Paper "Federated Learning with Non-IID Data"
☆10May 3, 2021Updated 5 years ago
baixianghuang / editing-attack
View on GitHub
Code and dataset for the paper: "Can Editing LLMs Inject Harm?" [AAAI'26]
☆21Dec 26, 2025Updated 6 months ago
eth-sri / smoothing-ensembles
View on GitHub
[ICLR 2022] Boosting Randomized Smoothing with Variance Reduced Classifiers
☆11Mar 29, 2022Updated 4 years ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
ethz-spylab / agentdojo
View on GitHub
A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents.
☆674Jun 2, 2026Updated last month
13thDayOfLunarMay / TECB-attack
View on GitHub
The implementatioin code of paper: “A Practical Clean-Label Backdoor Attack with Limited Information in Vertical Federated Learning”
☆11Jul 1, 2023Updated 3 years ago
chichidd / llm-lora-trojan
View on GitHub
Code for paper "The Philosopher’s Stone: Trojaning Plugins of Large Language Models"
☆33Sep 11, 2024Updated last year
uiuc-kang-lab / InjecAgent
View on GitHub
☆152Jul 2, 2024Updated 2 years ago
meng-wenlong / LMSanitator
View on GitHub
☆29Aug 21, 2023Updated 2 years ago
VincentNi0107 / BadVLMDriver
View on GitHub
☆10May 8, 2024Updated 2 years ago
thunxxx / MLLM-Jailbreak-evaluation-MMJ-Bench
View on GitHub
☆81Mar 30, 2025Updated last year
alevine0 / DPA
View on GitHub
Code for the paper "Deep Partition Aggregation: Provable Defenses against General Poisoning Attacks"
☆14Aug 22, 2022Updated 3 years ago
wegodev2 / virtual-prompt-injection
View on GitHub
Unofficial implementation of "Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection"
☆27Jul 6, 2024Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
grasses / PoisonPrompt
View on GitHub
Code for paper: PoisonPrompt: Backdoor Attack on Prompt-based Large Language Models, IEEE ICASSP 2024. Demo//124.220.228.133:11107
☆21Aug 10, 2024Updated last year
PKU-ML / PAT
View on GitHub
Code for NeurIPS 2024 Paper "Fight Back Against Jailbreaking via Prompt Adversarial Tuning"
☆22May 6, 2025Updated last year
johanernst / khPRF-PSA
View on GitHub
☆10May 31, 2023Updated 3 years ago
jiaxiaojunQAQ / I-GCG
View on GitHub
Improved techniques for optimization-based jailbreaking on large language models (ICLR2025)
☆146Apr 7, 2025Updated last year
jinyuan-jia / BaggingCertifyDataPoisoning
View on GitHub
☆12Dec 9, 2020Updated 5 years ago
uw-nsl / safechain
View on GitHub
[ACL 25] SafeChain: Safety of Language Models with Long Chain-of-Thought Reasoning Capabilities
☆30Apr 2, 2025Updated last year
uw-nsl / ArtPrompt
View on GitHub
[ACL24] Official Repo of Paper `ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs`
☆102Aug 15, 2025Updated 11 months ago