🔥🔥🔥 Detecting hidden backdoors in Large Language Models with only black-box access
☆53Jun 2, 2025Updated 9 months ago
Alternatives and similar repositories for BAIT
Users that are interested in BAIT are comparing it to the libraries listed below
Sorting:
- ☆17Sep 4, 2024Updated last year
- ☆15Dec 29, 2023Updated 2 years ago
- Siren: Byzantine-robust Federated Learning via Proactive Alarming (SoCC '21)☆11Mar 28, 2024Updated last year
- This is the implementation for IEEE S&P 2022 paper "Model Orthogonalization: Class Distance Hardening in Neural Networks for Better Secur…☆11Aug 24, 2022Updated 3 years ago
- ☆18Aug 15, 2022Updated 3 years ago
- ☆20Feb 11, 2024Updated 2 years ago
- [NDSS'23] BEAGLE: Forensics of Deep Learning Backdoor Attack for Better Defense☆17May 7, 2024Updated last year
- Official Implementation of NeurIPS 2024 paper - BiScope: AI-generated Text Detection by Checking Memorization of Preceding Tokens☆28Feb 17, 2026Updated last month
- Distribution Preserving Backdoor Attack in Self-supervised Learning☆20Jan 27, 2024Updated 2 years ago
- ☆14Feb 26, 2025Updated last year
- ☆26Dec 1, 2022Updated 3 years ago
- [NeurIPS 2025] BackdoorLLM: A Comprehensive Benchmark for Backdoor Attacks and Defenses on Large Language Models☆281Mar 13, 2026Updated last week
- [Oakland 2024] Exploring the Orthogonality and Linearity of Backdoor Attacks☆28Apr 15, 2025Updated 11 months ago
- Implement of Implicit Knowledge Extraction Attack.☆20May 28, 2025Updated 9 months ago
- Backdooring Neural Code Search☆14Sep 8, 2023Updated 2 years ago
- Composite Backdoor Attacks Against Large Language Models☆23Apr 12, 2024Updated last year
- [AAAI'21] Deep Feature Space Trojan Attack of Neural Networks by Controlled Detoxification☆29Dec 31, 2024Updated last year
- [NDSS 2025] "CLIBE: Detecting Dynamic Backdoors in Transformer-based NLP Models"☆25Aug 20, 2025Updated 7 months ago
- Code for paper: PoisonPrompt: Backdoor Attack on Prompt-based Large Language Models, IEEE ICASSP 2024. Demo//124.220.228.133:11107☆20Aug 10, 2024Updated last year
- ☆18Jun 15, 2021Updated 4 years ago
- [ICLR24] Official Repo of BadChain: Backdoor Chain-of-Thought Prompting for Large Language Models☆50Jul 24, 2024Updated last year
- Official repo for FSE'24 paper "CodeArt: Better Code Models by Attention Regularization When Symbols Are Lacking"☆18Mar 10, 2025Updated last year
- ☆27Aug 28, 2024Updated last year
- Implementation of "Physical Attack on Monocular Depth Estimation with Optimal Adversarial Patches"☆26Aug 31, 2022Updated 3 years ago
- Example TrojAI Submission☆27Dec 6, 2024Updated last year
- Nyx: Detecting Exploitable Front-Running Vulnerabilities in Smart Contracts☆22May 11, 2024Updated last year
- Code for AAAI 2021 "Towards Feature Space Adversarial Attack".☆30Aug 24, 2021Updated 4 years ago
- ☆13May 1, 2024Updated last year
- ☆19Mar 9, 2024Updated 2 years ago
- Code for NDSS 2022 paper "MIRROR: Model Inversion for Deep Learning Network with High Fidelity"☆27May 9, 2023Updated 2 years ago
- [ICLR 2023, Best Paper Award at ECCV’22 AROW Workshop] FLIP: A Provable Defense Framework for Backdoor Mitigation in Federated Learning☆60Dec 11, 2024Updated last year
- Code for paper "The Philosopher’s Stone: Trojaning Plugins of Large Language Models"☆28Sep 11, 2024Updated last year
- ☆37Oct 17, 2024Updated last year
- [ECCV'24] UNIT: Backdoor Mitigation via Automated Neural Distribution Tightening☆10Dec 18, 2025Updated 3 months ago
- [NDSS 2025] CENSOR: Defense Against Gradient Inversion via Orthogonal Subspace Bayesian Sampling☆16Jan 18, 2025Updated last year
- ☆12May 27, 2022Updated 3 years ago
- ☆23Jan 5, 2026Updated 2 months ago
- This is the implementation for CVPR 2022 Oral paper "Better Trigger Inversion Optimization in Backdoor Scanning."☆24Apr 5, 2022Updated 3 years ago
- 🔮Reasoning for Safer Code Generation; 🥇Winner Solution of Amazon Nova AI Challenge 2025☆36Aug 24, 2025Updated 6 months ago