AlexWan0/Poisoning-Instruction-Tuned-Models

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/AlexWan0/Poisoning-Instruction-Tuned-Models)

AlexWan0 / Poisoning-Instruction-Tuned-Models

☆59

Alternatives and similar repositories for Poisoning-Instruction-Tuned-Models

Users that are interested in Poisoning-Instruction-Tuned-Models are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ethz-spylab / rlhf-poisoning
View on GitHub
Code for paper "Universal Jailbreak Backdoors from Poisoned Human Feedback"
☆67Apr 24, 2024Updated 2 years ago
wegodev2 / virtual-prompt-injection
View on GitHub
Unofficial implementation of "Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection"
☆27Jul 6, 2024Updated 2 years ago
zqypku / mm_poison
View on GitHub
☆21Oct 25, 2023Updated 2 years ago
Lyz1213 / BadEdit
View on GitHub
☆38Oct 17, 2024Updated last year
xzhou98 / GBTL-attack
View on GitHub
☆18Jun 4, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
xszheng2020 / memorization
View on GitHub
An Empirical Study of Memorization in NLP (ACL 2022)
☆13Jun 22, 2022Updated 4 years ago
D0miH / does-clip-know-my-face
View on GitHub
Source Code for the JAIR Paper "Does CLIP Know my Face?" (Demo: https://huggingface.co/spaces/AIML-TUDA/does-clip-know-my-face)
☆15Jul 9, 2024Updated 2 years ago
multimodalbags / BAGS_Multimodal
View on GitHub
Backdooring Multimodal Learning
☆30May 4, 2023Updated 3 years ago
lancopku / DAN
View on GitHub
[Findings of EMNLP 2022] Expose Backdoors on the Way: A Feature-Based Efficient Defense against Textual Backdoor Attacks
☆13Feb 26, 2023Updated 3 years ago
lancopku / FedMNMT
View on GitHub
[Findings of ACL 2023] Communication Efficient Federated Learning for Multilingual Machine Translation with Adapter
☆12Sep 4, 2023Updated 2 years ago
shuaizhao95 / ICLAttack
View on GitHub
ICL backdoor attack
☆17Nov 4, 2024Updated last year
grasses / PoisonPrompt
View on GitHub
Code for paper: PoisonPrompt: Backdoor Attack on Prompt-based Large Language Models, IEEE ICASSP 2024. Demo//124.220.228.133:11107
☆21Aug 10, 2024Updated last year
HanxunH / CognitiveDistillation
View on GitHub
[ICLR2023] Distilling Cognitive Backdoor Patterns within an Image
☆37Oct 29, 2025Updated 9 months ago
kohpangwei / data-poisoning-journal-release
View on GitHub
☆18Sep 29, 2020Updated 5 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
pietrolesci / memorisation-profiles
View on GitHub
This is the official implementation for our ACL 2024 paper: "Causal Estimation of Memorisation Profiles".
☆25Mar 25, 2025Updated last year
microsoft / analysing_pii_leakage
View on GitHub
The repository contains the code for analysing the leakage of personally identifiable (PII) information from the output of next word pred…
☆105Aug 13, 2024Updated last year
lancopku / agent-backdoor-attacks
View on GitHub
Code&Data for the paper "Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents" [NeurIPS 2024]
☆116Sep 27, 2024Updated last year
T1aNS1R / Evil-Geniuses
View on GitHub
☆71Feb 4, 2024Updated 2 years ago
SRI-CSL / Trinity-TrojAI
View on GitHub
This repository contains code developed by the SRI team for the IARPA/TrojAI program.
☆21Jul 1, 2021Updated 5 years ago
ClonedOne / MalwareBackdoors
View on GitHub
Code for the paper Explanation-Guided Backdoor Poisoning Attacks Against Malware Classifiers
☆61Apr 29, 2022Updated 4 years ago
PurduePAML / K-ARM_Backdoor_Optimization
View on GitHub
☆18Jun 15, 2021Updated 5 years ago
spike-imperial / FastLAS
View on GitHub
☆25Jan 27, 2026Updated 6 months ago
ZhangShiyue / extractive_is_not_faithful
View on GitHub
☆17May 19, 2023Updated 3 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
CODAIT / Identifying-Incorrect-Labels-In-CoNLL-2003
View on GitHub
Research into identifying and correcting incorrect labels in the CoNLL-2003 corpus.
☆12May 11, 2021Updated 5 years ago
alon-albalak / online-data-mixing
View on GitHub
An implementation of online data mixing for the Pile dataset, based on the GPT-NeoX library.
☆14Jan 9, 2024Updated 2 years ago
lorenzogentile404 / feldman-verifiable-secret-sharing
View on GitHub
☆10Apr 29, 2020Updated 6 years ago
illidanlab / inversion-influence-function
View on GitHub
Official codes for "Understanding Deep Gradient Leakage via Inversion Influence Functions", NeurIPS 2023
☆16Oct 13, 2023Updated 2 years ago
mli / mx-theme
View on GitHub
☆10Jun 29, 2022Updated 4 years ago
tml-epfl / long-is-more-for-alignment
View on GitHub
Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning [ICML 2024]
☆21May 2, 2024Updated 2 years ago
thunlp / BlockFFN
View on GitHub
Source codes for paper "BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity".
☆19Jan 10, 2026Updated 6 months ago
mireshghallah / ft-memorization
View on GitHub
☆13Oct 20, 2022Updated 3 years ago
yangarbiter / rare-spurious-correlation
View on GitHub
Understanding Rare Spurious Correlations in Neural Network
☆12Jun 5, 2022Updated 4 years ago
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
ChenWu98 / agent-attack
View on GitHub
[ICLR 2025] Dissecting adversarial robustness of multimodal language model agents
☆140Feb 19, 2025Updated last year
CryptoAILab / MergeGuard
View on GitHub
[CCS-LAMPS'24] LLM IP Protection Against Model Merging
☆16Oct 14, 2024Updated last year
lapisrocks / rpo
View on GitHub
Official repository for "Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks"
☆62Aug 8, 2024Updated last year
JoaoLages / RATransformers
View on GitHub
RATransformers 🐭- Make your transformer (like BERT, RoBERTa, GPT-2 and T5) Relation Aware!
☆42Dec 14, 2022Updated 3 years ago
zleizzo / datadeletion
View on GitHub
☆13Feb 24, 2020Updated 6 years ago
allegro-lab / hubble
View on GitHub
Hubble is a suite of fully open-source large language models (LLMs) for the scientific study of LLM memorization.
☆22Apr 15, 2026Updated 3 months ago
Gwinhen / PixelBackdoor
View on GitHub
This is the implementation for CVPR 2022 Oral paper "Better Trigger Inversion Optimization in Backdoor Scanning."
☆24Apr 5, 2022Updated 4 years ago