pratyushmaini / localizing-memorization
Official Repository for ICML 2023 paper "Can Neural Network Memorization Be Localized?"
☆17Updated last year
Alternatives and similar repositories for localizing-memorization:
Users that are interested in localizing-memorization are comparing it to the libraries listed below
- ☆53Updated last year
- ☆20Updated 7 months ago
- ☆30Updated 2 months ago
- [ICML 2023] "Robust Weight Signatures: Gaining Robustness as Easy as Patching Weights?" by Ruisi Cai, Zhenyu Zhang, Zhangyang Wang☆15Updated last year
- Code and data to go with the Zhu et al. paper "An Objective for Nuanced LLM Jailbreaks"☆23Updated 2 months ago
- ☆34Updated last year
- Code for the paper "Evading Black-box Classifiers Without Breaking Eggs" [SaTML 2024]☆20Updated 10 months ago
- Official Repository for The Paper: Safety Alignment Should Be Made More Than Just a Few Tokens Deep☆74Updated 7 months ago
- A simple PyTorch implementation of influence functions.☆84Updated 8 months ago
- A modern look at the relationship between sharpness and generalization [ICML 2023]☆43Updated last year
- ☆41Updated 3 weeks ago
- Starter kit and data loading code for the Trojan Detection Challenge NeurIPS 2022 competition☆33Updated last year
- Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses (NeurIPS 2024)☆56Updated last month
- ☆14Updated 5 years ago
- [ICML 2024] Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications☆71Updated this week
- ☆21Updated last year
- A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.☆64Updated 3 months ago
- ☆33Updated 6 months ago
- ☆30Updated 5 months ago
- This is an official repository for "LAVA: Data Valuation without Pre-Specified Learning Algorithms" (ICLR2023).☆47Updated 8 months ago
- Code for safety test in "Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates"☆18Updated 11 months ago
- ☆19Updated 7 months ago
- ☆14Updated 8 months ago
- ☆28Updated 7 months ago
- Towards Understanding Sharpness-Aware Minimization [ICML 2022]☆35Updated 2 years ago
- SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors☆44Updated this week
- ☆64Updated 2 years ago
- ☆17Updated 2 months ago
- DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion Models (ICLR 2024)☆62Updated 5 months ago