VITA-Group / Robust_Weight_SignaturesLinks

[ICML 2023] "Robust Weight Signatures: Gaining Robustness as Easy as Patching Weights?" by Ruisi Cai, Zhenyu Zhang, Zhangyang Wang

☆16

Alternatives and similar repositories for Robust_Weight_Signatures

Users that are interested in Robust_Weight_Signatures are comparing it to the libraries listed below

Sorting:

mireshghallah / neighborhood-curvature-mia
☆23Updated 2 years ago
modestyachts / cifar-10.2
Host CIFAR-10.2 Data Set
☆13Updated 4 years ago
UCSC-VLAA / AttnGCG-attack
☆19Updated 4 months ago
shizhediao / Black-Box-Prompt-Learning
Source code for the TMLR paper "Black-Box Prompt Learning for Pre-trained Language Models"
☆56Updated 2 years ago
weichen-yu / LM-Extraction
☆43Updated 2 years ago
tml-epfl / sharpness-vs-generalization
A modern look at the relationship between sharpness and generalization [ICML 2023]
☆43Updated 2 years ago
milesaturpin / cot-unfaithfulness
☆49Updated 2 years ago
ejones313 / auditing-llms
☆58Updated 2 years ago
tanganke / opcm
official code repo for paper "Merging Models on the Fly Without Retraining: A Sequential Approach to Scalable Continual Model Merging"
☆20Updated 2 weeks ago
EnnengYang / RepresentationSurgery
Representation Surgery for Multi-Task Model Merging. ICML, 2024.
☆46Updated last year
vfleaking / PTST
Code for safety test in "Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates"
☆20Updated last month
princeton-nlp / benign-data-breaks-safety
☆41Updated last year
JasonForJoy / Model-Editing-Hurt
EMNLP 2024: Model Editing Harms General Abilities of Large Language Models: Regularization to the Rescue
☆37Updated 5 months ago
thestephencasper / latent_adversarial_training
☆23Updated last year
Vaidehi99 / InfoDeletionAttacks
☆46Updated 8 months ago
locuslab / T-MARS
Code for T-MARS data filtering
☆35Updated 2 years ago
Model-GLUE / Model-GLUE
☆18Updated last year
Jayfeather1024 / Backdoor-Enhanced-Alignment
☆23Updated 10 months ago
ethz-spylab / rlhf-poisoning
Code for paper "Universal Jailbreak Backdoors from Poisoned Human Feedback"
☆61Updated last year
liuchen11 / AdversaryLossLandscape
On the Loss Landscape of Adversarial Training: Identifying Challenges and How to Overcome Them [NeurIPS 2020]
☆36Updated 4 years ago
locuslab / acr-memorization
☆37Updated 10 months ago
ethz-spylab / realistic-adv-examples
Code for the paper "Evading Black-box Classifiers Without Breaking Eggs" [SaTML 2024]
☆21Updated last year
pdejorge / N-FGSM
Official repo for the paper "Make Some Noise: Reliable and Efficient Single-Step Adversarial Training" (https://arxiv.org/abs/2202.01181)
☆25Updated 3 years ago
pratyushmaini / llm_dataset_inference
Official Repository for Dataset Inference for LLMs
☆41Updated last year
kyleliang919 / Uncovering-the-Connections-BetweenAdversarial-Transferability-and-Knowledge-Transferability
code for ICML 2021 paper in which we explore the relationship between adversarial transferability and knowledge transferability.
☆17Updated 2 years ago
UCSC-VLAA / vllm-safety-benchmark
[ECCV 2024] Official PyTorch Implementation of "How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs"
☆83Updated last year
r-three / mats
☆31Updated last year
MadryLab / smoothed-vit
Certified Patch Robustness via Smoothed Vision Transformers
☆42Updated 3 years ago
BatsResearch / ex2
If CLIP Could Talk: Understanding Vision-Language Model Representations Through Their Preferred Concept Descriptions
☆17Updated last year
deeplearning-wisc / haloscope
source code for NeurIPS'24 paper "HaloScope: Harnessing Unlabeled LLM Generations for Hallucination Detection"
☆60Updated 6 months ago