ShanglunFengatETHZ/PrivacyBackdoor

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ShanglunFengatETHZ/PrivacyBackdoor)

ShanglunFengatETHZ / PrivacyBackdoor

Privacy backdoors

☆50

Alternatives and similar repositories for PrivacyBackdoor

Users that are interested in PrivacyBackdoor are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

shiyuchengTJU / PAR
View on GitHub
☆14Mar 23, 2023Updated 3 years ago
ahans30 / goldfish-loss
View on GitHub
[NeurIPS 2024] Goldfish Loss: Mitigating Memorization in Generative LLMs
☆98Nov 17, 2024Updated last year
tml-epfl / llm-past-tense
View on GitHub
Does Refusal Training in LLMs Generalize to the Past Tense? [ICLR 2025]
☆79Jan 23, 2025Updated last year
BatsResearch / cross-lingual-detox
View on GitHub
Code for "Preference Tuning For Toxicity Mitigation Generalizes Across Languages." Paper accepted at Findings of EMNLP 2024
☆18Mar 25, 2025Updated last year
ethz-spylab / rlhf-poisoning
View on GitHub
Code for paper "Universal Jailbreak Backdoors from Poisoned Human Feedback"
☆67Apr 24, 2024Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
weizeming / momentum-attack-llm
View on GitHub
☆25Jan 17, 2025Updated last year
ALT-JS / OthelloSAE
View on GitHub
CS194-196 Course Project
☆14Feb 20, 2025Updated last year
y0mingzhang / diffuse-distributions
View on GitHub
Forcing Diffuse Distributions out of Language Models
☆18Sep 10, 2024Updated last year
RJ-T / NIPS2022_EP_BNP
View on GitHub
Official Implementation of NIPS 2022 paper Pre-activation Distributions Expose Backdoor Neurons
☆15Jan 13, 2023Updated 3 years ago
dangxingyu / rnn-icrag
View on GitHub
Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"
☆27Apr 17, 2024Updated 2 years ago
ai-forever / LIBRA
View on GitHub
☆22Jun 11, 2026Updated last month
renjie3 / MemAttn
View on GitHub
☆16Feb 23, 2025Updated last year
goombalab / Gather-and-Aggregate
View on GitHub
Experiments Notebook of "Understanding the Skill Gap in Recurrent Language Models: The Role of the Gather-and-Aggregate Mechanism"
☆16Apr 30, 2025Updated last year
mtkresearch / shortest-path-diffusion
View on GitHub
Official code for the paper "Image generation with shortest path diffusion" accepted at ICML 2023.
☆24Jul 10, 2023Updated 3 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
tribhuvanesh / prediction-poisoning
View on GitHub
Prediction Poisoning: Towards Defenses Against DNN Model Stealing Attacks (ICLR '20)
☆33Nov 4, 2020Updated 5 years ago
jinghuichen / AWM
View on GitHub
Github repo for One-shot Neural Backdoor Erasing via Adversarial Weight Masking (NeurIPS 2022)
☆15Jan 3, 2023Updated 3 years ago
FLAIROx / cultural-accumulation
View on GitHub
☆16Jul 16, 2024Updated 2 years ago
fra31 / robust-finetuning
View on GitHub
Code relative to "Adversarial robustness against multiple and single $l_p$-threat models via quick fine-tuning of robust classifiers"
☆19Nov 30, 2022Updated 3 years ago
elehman16 / exposing_patient_data_release
View on GitHub
☆53May 2, 2021Updated 5 years ago
bboylyg / RNP
View on GitHub
Reconstructive Neuron Pruning for Backdoor Defense (ICML 2023)
☆40Dec 24, 2023Updated 2 years ago
ZiyueWang25 / llm-security-challenge
View on GitHub
Can Large Language Models Solve Security Challenges? We test LLMs' ability to interact and break out of shell environments using the Over…
☆13Aug 21, 2023Updated 2 years ago
kzhao5 / ModelExtractionPapers
View on GitHub
Model Extraction(Stealing) Attacks and Defenses on Machine Learning Models Literature
☆32Sep 25, 2024Updated last year
Lyz1213 / BadEdit
View on GitHub
☆38Oct 17, 2024Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
Heidelberg-NLP / CC-SHAP
View on GitHub
Code for "On Measuring Faithfulness of Natural Language Explanations"
☆23Jul 14, 2026Updated last week
DanielSc4 / Dynamic-Activation-Composition
View on GitHub
Materials for "Multi-property Steering of Large Language Models with Dynamic Activation Composition"
☆14Nov 22, 2024Updated last year
JinyiW / GuidedDiffusionPur
View on GitHub
☆64Aug 9, 2023Updated 2 years ago
uchicago-sandlab / naturalbackdoors
View on GitHub
Code for identifying natural backdoors in existing image datasets.
☆15Aug 24, 2022Updated 3 years ago
ethz-spylab / satml-llm-ctf
View on GitHub
Code used to run the platform for the LLM CTF colocated with SaTML 2024
☆29Mar 20, 2024Updated 2 years ago
zlijingtao / ResSFL
View on GitHub
Official Repository for ResSFL (accepted by CVPR '22)
☆26Jun 24, 2022Updated 4 years ago
McGill-NLP / AdversarialTriggers
View on GitHub
TACL 2025: Investigating Adversarial Trigger Transfer in Large Language Models
☆19Aug 17, 2025Updated 11 months ago
Jayfeather1024 / Backdoor-Enhanced-Alignment
View on GitHub
☆24Dec 8, 2024Updated last year
ml-postech / gradient-inversion-generative-image-prior
View on GitHub
☆50Dec 29, 2021Updated 4 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
XuanChen-xc / RLbreaker
View on GitHub
Code for "When LLM Meets DRL: Advancing Jailbreaking Efficiency via DRL-guided Search" (NeurIPS 2024)
☆18Oct 22, 2024Updated last year
mt-upc / logit-explanations
View on GitHub
☆18Jun 19, 2023Updated 3 years ago
kvfrans / jaxtransformer
View on GitHub
Minimal Transformer base in JAX. A single backbone for language modelling, diffusion, classification, etc...
☆16May 28, 2025Updated last year
UCSB-NLP-Chang / SemanticSmooth
View on GitHub
Implementation of paper 'Defending Large Language Models against Jailbreak Attacks via Semantic Smoothing'
☆24Jun 9, 2024Updated 2 years ago
eth-sri / llmprivacy
View on GitHub
☆75Feb 16, 2025Updated last year
lucidrains / GAF-microbatch-pytorch
View on GitHub
Implementation of Gradient Agreement Filtering, from Chaubard et al. of Stanford, but for single machine microbatches, in Pytorch
☆25Jan 21, 2025Updated last year
google-research / fooling-feature-visualizations
View on GitHub
Code for "Don't trust your eyes: on the (un)reliability of feature visualizations" (ICML 2024)
☆34Nov 15, 2023Updated 2 years ago