[NeurIPS 2025] BackdoorLLM: A Comprehensive Benchmark for Backdoor Attacks and Defenses on Large Language Models
☆281Mar 13, 2026Updated last week
Alternatives and similar repositories for BackdoorLLM
Users that are interested in BackdoorLLM are comparing it to the libraries listed below
Sorting:
- Code&Data for the paper "Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents" [NeurIPS 2024]☆111Sep 27, 2024Updated last year
- ☆26Aug 21, 2024Updated last year
- [ICLR24] Official Repo of BadChain: Backdoor Chain-of-Thought Prompting for Large Language Models☆50Jul 24, 2024Updated last year
- Code for paper: PoisonPrompt: Backdoor Attack on Prompt-based Large Language Models, IEEE ICASSP 2024. Demo//124.220.228.133:11107☆20Aug 10, 2024Updated last year
- Safety at Scale: A Comprehensive Survey of Large Model Safety☆242Updated this week
- [ICLR 2025] BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacks☆31Nov 2, 2025Updated 4 months ago
- ☆14Feb 26, 2025Updated last year
- [ICML 2025] X-Transfer Attacks: Towards Super Transferable Adversarial Attacks on CLIP☆40Feb 3, 2026Updated last month
- ICL backdoor attack☆17Nov 4, 2024Updated last year
- AnyDoor: Test-Time Backdoor Attacks on Multimodal Large Language Models☆60Apr 8, 2024Updated last year
- Code for Neurips 2024 paper "Shadowcast: Stealthy Data Poisoning Attacks Against Vision-Language Models"☆60Jan 15, 2025Updated last year
- ☆37Oct 17, 2024Updated last year
- A survey on harmful fine-tuning attack for large language model☆235Feb 25, 2026Updated 3 weeks ago
- [ICLR2025] Detecting Backdoor Samples in Contrastive Language Image Pretraining☆19Feb 26, 2025Updated last year
- Composite Backdoor Attacks Against Large Language Models☆23Apr 12, 2024Updated last year
- This is the repository that introduces research topics related to protecting intellectual property (IP) of AI from a data-centric perspec…☆23Oct 30, 2023Updated 2 years ago
- Code for paper "Membership Inference Attacks Against Vision-Language Models"☆27Jan 25, 2025Updated last year
- [NeurIPS 2024] Official implementation for "AgentPoison: Red-teaming LLM Agents via Memory or Knowledge Base Backdoor Poisoning"☆206Apr 12, 2025Updated 11 months ago
- This repo is the official implementation of the ICLR'23 paper "Towards Robustness Certification Against Universal Perturbations." We calc…☆12Feb 14, 2023Updated 3 years ago