inspire-group / DP-RandPLinks

[NeurIPS 2023] Differentially Private Image Classification by Learning Priors from Random Processes

☆12

Alternatives and similar repositories for DP-RandP

Users that are interested in DP-RandP are comparing it to the libraries listed below

Sorting:

RU-System-Software-and-Security / NONE
☆10Updated 3 years ago
BrachioLab / adversarial_prompting
☆53Updated 2 years ago
reds-lab / Meta-Sift
The official implementation of USENIX Security'23 paper "Meta-Sift" -- Ten minutes or less to find a 1000-size or larger clean subset on …
☆20Updated 2 years ago
AISafety-HKUST / Backdoor_Safety_Tuning
Backdoor Safety Tuning (NeurIPS 2023 & 2024 Spotlight)
☆27Updated last year
jinhaoduan / SecMI
[ICML 2023] Are Diffusion Models Vulnerable to Membership Inference Attacks?
☆42Updated last year
jinghuichen / AWM
Github repo for One-shot Neural Backdoor Erasing via Adversarial Weight Masking (NeurIPS 2022)
☆15Updated 2 years ago
csdongxian / ANP_backdoor
Codes for NeurIPS 2021 paper "Adversarial Neuron Pruning Purifies Backdoored Deep Models"
☆59Updated 2 years ago
YuxinWenRick / canary-in-a-coalmine
☆32Updated 2 years ago
bboylyg / RNP
Reconstructive Neuron Pruning for Backdoor Defense (ICML 2023)
☆39Updated last year
qingjiesjtu / USC
This is the code repository of our submission: Understanding the Dark Side of LLMs’ Intrinsic Self-Correction.
☆63Updated 11 months ago
umd-huang-lab / VLM-Poisoning
Code for Neurips 2024 paper "Shadowcast: Stealthy Data Poisoning Attacks Against Vision-Language Models"
☆57Updated 10 months ago
SCLBD / DBD
☆33Updated 3 years ago
AI-secure / MMDT
Comprehensive Assessment of Trustworthiness in Multimodal Foundation Models
☆24Updated 8 months ago
OPTML-Group / Unlearn-Sparse
[NeurIPS23 (Spotlight)] "Model Sparsity Can Simplify Machine Unlearning" by Jinghan Jia*, Jiancheng Liu*, Parikshit Ram, Yuguang Yao, Gao…
☆81Updated last year
rain152 / PAT
[NeurIPS 2024] Fight Back Against Jailbreaking via Prompt Adversarial Tuning
☆10Updated last year
reds-lab / CLIP-MIA
This is an official repository for Practical Membership Inference Attacks Against Large-Scale Multi-Modal Models: A Pilot Study (ICCV2023…
☆24Updated 2 years ago
ethz-spylab / autoadvexbench
☆33Updated 6 months ago
ethz-spylab / rlhf-poisoning
Code for paper "Universal Jailbreak Backdoors from Poisoned Human Feedback"
☆62Updated last year
uchicago-sandlab / naturalbackdoors
Code for identifying natural backdoors in existing image datasets.
☆15Updated 3 years ago
wagner-group / MarkMyWords
☆32Updated last year
cleverhans-lab / dataset-inference
[ICLR'21] Dataset Inference for Ownership Resolution in Machine Learning
☆32Updated 3 years ago
rmin2000 / adv_tracing
Identification of the Adversary from a Single Adversarial Example (ICML 2023)
☆10Updated last year
Jimmy-di / camouflage-poisoning
Camouflage poisoning via machine unlearning
☆18Updated 5 months ago
ZhentingWang / DIAGNOSIS
☆23Updated last year
inspire-group / tta_risk
☆14Updated 2 years ago
facebookresearch / jailbreak-objectives
Code and data to go with the Zhu et al. paper "An Objective for Nuanced LLM Jailbreaks"
☆35Updated 11 months ago
RylanSchaeffer / AstraFellowship-When-Do-VLM-Image-Jailbreaks-Transfer
Code for ICLR 2025 Failures to Find Transferable Image Jailbreaks Between Vision-Language Models
☆34Updated 6 months ago
SolidShen / RIPPLE_official
☆20Updated last year
centerforaisafety / tdc2023-starter-kit
This is the starter kit for the Trojan Detection Challenge 2023 (LLM Edition), a NeurIPS 2023 competition.
☆90Updated last year
HanxunH / CognitiveDistillation
[ICLR2023] Distilling Cognitive Backdoor Patterns within an Image
☆36Updated last month