sail-sg / D-TRAK
Intriguing Properties of Data Attribution on Diffusion Models (ICLR 2024)
☆23Updated 9 months ago
Related projects ⓘ
Alternatives and complementary repositories for D-TRAK
- Official pytorch implementation of "Interpreting the Second-Order Effects of Neurons in CLIP"☆28Updated this week
- ☆15Updated last week
- [ATTRIB @ NeurIPS 2024] When Attention Sink Emerges in Language Models: An Empirical View☆29Updated last month
- [Arxiv 2024] Adversarial attacks on multimodal agents☆39Updated 4 months ago
- On Memorization in Diffusion Models☆23Updated last year
- What do we learn from inverting CLIP models?☆45Updated 8 months ago
- Official Repository for The Paper: Safety Alignment Should Be Made More Than Just a Few Tokens Deep☆28Updated 4 months ago
- [SafeGenAi @ NeurIPS 2024] Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates☆60Updated 3 weeks ago
- Implementation of paper 'Reversing the Forget-Retain Objectives: An Efficient LLM Unlearning Framework from Logit Difference' [NeurIPS'24…☆13Updated 5 months ago
- ☆27Updated 9 months ago
- ☆30Updated 2 weeks ago
- Code for safety test in "Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates"☆17Updated 8 months ago
- ☆13Updated 8 months ago
- ☆40Updated last year
- The official code of the paper "A Closer Look at Machine Unlearning for Large Language Models".☆13Updated last month
- Code for the paper "The Journey, Not the Destination: How Data Guides Diffusion Models"☆19Updated 11 months ago
- ☆26Updated 3 weeks ago
- ☆38Updated last year
- The official repository for paper "MLLM-Protector: Ensuring MLLM’s Safety without Hurting Performance"☆31Updated 7 months ago
- ☆16Updated last year
- Official implementation of NeurIPS'24 paper "Defensive Unlearning with Adversarial Training for Robust Concept Erasure in Diffusion Model…☆26Updated 2 weeks ago
- ☆21Updated last month
- Official implementation of Bootstrapping Language Models via DPO Implicit Rewards☆39Updated 3 months ago
- Code for Neurips 2024 paper "Shadowcast: Stealthy Data Poisoning Attacks Against Vision-Language Models"☆28Updated last month
- Official repo of Progressive Data Expansion: data, code and evaluation☆27Updated last year
- Code for NeurIPS'23 paper "A Bayesian Approach To Analysing Training Data Attribution In Deep Learning"☆14Updated 10 months ago
- [ICML 2024] Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models.☆45Updated 3 months ago
- [NeurIPS 2023 Spotlight] Temperature Balancing, Layer-wise Weight Analysis, and Neural Network Training☆28Updated 11 months ago
- ☆57Updated last month
- The official implementation of ECCV'24 paper "To Generate or Not? Safety-Driven Unlearned Diffusion Models Are Still Easy To Generate Uns…☆58Updated 2 weeks ago