XuandongZhao / pf-decoding
Permute-and-Flip: An optimally robust and watermarkable decoder for LLMs
☆11Updated 9 months ago
Related projects ⓘ
Alternatives and complementary repositories for pf-decoding
- [SafeGenAi @ NeurIPS 2024] Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates☆60Updated 3 weeks ago
- Official Implementation of the paper "Three Bricks to Consolidate Watermarks for LLMs"☆43Updated 9 months ago
- List of T2I safety papers, updated daily, welcome to discuss using Discussions☆45Updated 3 months ago
- [ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning☆84Updated 5 months ago
- [ATTRIB @ NeurIPS 2024] When Attention Sink Emerges in Language Models: An Empirical View☆29Updated last month
- [Arxiv 2024] Adversarial attacks on multimodal agents☆39Updated 4 months ago
- RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models. NeurIPS 2024☆62Updated last month
- The official code of the paper "Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate".☆83Updated 3 weeks ago
- ☆34Updated 9 months ago
- [ICLR 2024] Provable Robust Watermarking for AI-Generated Text☆26Updated 11 months ago
- ☆27Updated 9 months ago
- CMD: a framework for Context-aware Model self-Detoxification (EMNLP2024 Main)☆14Updated last month
- ☆17Updated 4 months ago
- ☆33Updated last year
- ☆25Updated last month
- Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks☆21Updated 4 months ago
- Codebase for decoding compressed trust.☆20Updated 6 months ago
- Code for our paper "Defending ChatGPT against Jailbreak Attack via Self-Reminder" in NMI.☆43Updated last year
- The official implementation of ECCV'24 paper "To Generate or Not? Safety-Driven Unlearned Diffusion Models Are Still Easy To Generate Uns…☆58Updated 2 weeks ago
- Official code implementation of SKU, Accepted by ACL 2024 Findings☆11Updated 6 months ago
- [ACL 2024] Code and data for "Machine Unlearning of Pre-trained Large Language Models"☆47Updated last month
- The official repository for paper "MLLM-Protector: Ensuring MLLM’s Safety without Hurting Performance"☆31Updated 7 months ago
- ☆35Updated 4 months ago
- ☆38Updated last year
- code for ACL24 "MELoRA: Mini-Ensemble Low-Rank Adapter for Parameter-Efficient Fine-Tuning"☆15Updated 6 months ago
- [ECCV 2024] Official PyTorch Implementation of "How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs"☆67Updated 11 months ago
- ☆15Updated last week
- Code repo of our paper Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis (https://arxiv.org/abs/2406.10794…☆12Updated 3 months ago
- [ICML 2024] Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast☆90Updated 7 months ago
- Official implementation of Bootstrapping Language Models via DPO Implicit Rewards☆39Updated 3 months ago