hlzhang109 / impossibility-watermark
☆20Updated 2 months ago
Related projects ⓘ
Alternatives and complementary repositories for impossibility-watermark
- Official Implementation of the paper "Three Bricks to Consolidate Watermarks for LLMs"☆43Updated 9 months ago
- Code for watermarking language models☆72Updated 2 months ago
- ☆53Updated last year
- ☆25Updated 5 months ago
- ☆21Updated 5 months ago
- Code and data for paper "A Semantic Invariant Robust Watermark for Large Language Models" accepted by ICLR 2024.☆25Updated last week
- ☆12Updated 8 months ago
- [NeurIPS 2023] Differentially Private Image Classification by Learning Priors from Random Processes☆11Updated last year
- ☆32Updated 11 months ago
- Implementation of 'A Watermark for Large Language Models' paper by Kirchenbauer & Geiping et. al.☆23Updated last year
- Code repo of our paper Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis (https://arxiv.org/abs/2406.10794…☆12Updated 3 months ago
- Landing Page for TOFU☆98Updated 5 months ago
- The official implementation of ECCV'24 paper "To Generate or Not? Safety-Driven Unlearned Diffusion Models Are Still Easy To Generate Uns…☆58Updated 2 weeks ago
- Code for paper "Universal Jailbreak Backdoors from Poisoned Human Feedback"☆41Updated 6 months ago
- "In-Context Unlearning: Language Models as Few Shot Unlearners". Martin Pawelczyk, Seth Neel* and Himabindu Lakkaraju*; ICML 2024.☆15Updated last year
- ☆20Updated 9 months ago
- ☆49Updated last year
- Code for Neurips 2024 paper "Shadowcast: Stealthy Data Poisoning Attacks Against Vision-Language Models"☆28Updated last month
- ☆38Updated last year
- ☆26Updated 3 weeks ago
- ☆78Updated last week
- Robust natural language watermarking using invariant features☆25Updated last year
- This code is the official implementation of WEvade.☆37Updated 8 months ago
- Certified robustness "for free" using off-the-shelf diffusion models and classifiers☆36Updated last year
- Code for the paper "BadPrompt: Backdoor Attacks on Continuous Prompts"☆36Updated 4 months ago
- ☆19Updated 3 weeks ago
- The official code of the paper "A Closer Look at Machine Unlearning for Large Language Models".☆13Updated last month
- Backdoor Safety Tuning (NeurIPS 2023 & 2024 Spotlight)☆24Updated this week
- ☆15Updated 6 months ago
- Official Code for "Baseline Defenses for Adversarial Attacks Against Aligned Language Models"☆20Updated last year