The official repository of the paper "On the Exploitability of Instruction Tuning".
☆70Feb 5, 2024Updated 2 years ago
Alternatives and similar repositories for AutoPoison
Users that are interested in AutoPoison are comparing it to the libraries listed below
Sorting:
- Official implementation of GOAT model (ICML2023)☆38Jul 3, 2023Updated 2 years ago
- ☆33Nov 27, 2023Updated 2 years ago
- ☆16Jul 17, 2022Updated 3 years ago
- Gemstones: A Model Suite for Multi-Faceted Scaling Laws (NeurIPS 2025)☆33Sep 28, 2025Updated 5 months ago
- Generating Potent Poisons and Backdoors from Scratch with Guided Diffusion☆11Apr 1, 2024Updated last year
- Code for paper: "RemovalNet: DNN model fingerprinting removal attack", IEEE TDSC 2023.☆10Nov 27, 2023Updated 2 years ago
- A simple and efficient baseline for data attribution☆11Nov 10, 2023Updated 2 years ago
- Official repo for Detecting, Explaining, and Mitigating Memorization in Diffusion Models (ICLR 2024)☆78Apr 3, 2024Updated last year
- ☆24Jan 27, 2022Updated 4 years ago
- ☆54Sep 11, 2021Updated 4 years ago
- [ICLR'21] Dataset Inference for Ownership Resolution in Machine Learning☆32Oct 10, 2022Updated 3 years ago
- Official Pytorch repo of CVPR'23 and NeurIPS'23 papers on understanding replication in diffusion models.☆113Nov 22, 2023Updated 2 years ago
- Code for the CVPR 2020 article "Adversarial Vertex mixup: Toward Better Adversarially Robust Generalization"☆13Jul 13, 2020Updated 5 years ago
- Code for paper "Universal Jailbreak Backdoors from Poisoned Human Feedback"☆66Apr 24, 2024Updated last year
- What do we learn from inverting CLIP models?☆58Mar 6, 2024Updated last year
- The official code for the publication: "The Close Relationship Between Contrastive Learning and Meta-Learning".☆18Sep 19, 2022Updated 3 years ago
- Package to optimize Adversarial Attacks against (Large) Language Models with Varied Objectives☆70Feb 22, 2024Updated 2 years ago
- ☆11Oct 20, 2023Updated 2 years ago
- ☆11Oct 20, 2023Updated 2 years ago
- [ICML 2023] Protecting Language Generation Models via Invisible Watermarking☆13Sep 8, 2023Updated 2 years ago
- ☆12Apr 22, 2024Updated last year
- ☆11Mar 13, 2023Updated 2 years ago
- ☆69Feb 17, 2024Updated 2 years ago
- Pytorch Datasets for Easy-To-Hard☆29Jan 9, 2025Updated last year
- 🤫 Code and benchmark for our ICLR 2024 spotlight paper: "Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Con…☆50Dec 20, 2023Updated 2 years ago
- Official release of code for the paper RL is a hammer and LLMs are nails A simple RL approach to stronger prompt injection attacks☆40Feb 11, 2026Updated 2 weeks ago
- ☆10Jul 28, 2022Updated 3 years ago
- ☆14Jun 4, 2025Updated 8 months ago
- The official pytorch implementation of ACM MM 19 paper "MetaAdvDet: Towards Robust Detection of Evolving Adversarial Attacks"☆11Jun 7, 2021Updated 4 years ago
- Pytorch ImageNet1k Loader with Bounding Boxes.☆13Jan 23, 2022Updated 4 years ago
- ☆11Oct 2, 2023Updated 2 years ago
- Implementation of Self-supervised-Online-Adversarial-Purification☆13Aug 2, 2021Updated 4 years ago
- All code and data necessary to replicate experiments in the paper BAGM: A Backdoor Attack for Manipulating Text-to-Image Generative Model…☆13Sep 16, 2024Updated last year
- [NeurIPS 2024] "Self-Calibrated Tuning of Vision-Language Models for Out-of-Distribution Detection"☆13Oct 28, 2024Updated last year
- ☆15Apr 7, 2023Updated 2 years ago
- [ECCV2024] Immunizing text-to-image Models against Malicious Adaptation☆17Jan 17, 2025Updated last year
- Codes for reproducing the results of the paper "Bridging Mode Connectivity in Loss Landscapes and Adversarial Robustness" published at IC…☆27Apr 29, 2020Updated 5 years ago
- [NeurIPS'22] Trap and Replace: Defending Backdoor Attacks by Trapping Them into an Easy-to-Replace Subnetwork. Haotao Wang, Junyuan Hong,…☆15Nov 27, 2023Updated 2 years ago
- Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks☆32Jul 9, 2024Updated last year