Twilight92z / Quantize-Watermark
☆20Updated last year
Alternatives and similar repositories for Quantize-Watermark:
Users that are interested in Quantize-Watermark are comparing it to the libraries listed below
- Codebase for decoding compressed trust.☆23Updated 9 months ago
- Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses (NeurIPS 2024)☆56Updated last month
- ☆29Updated 4 months ago
- ☆17Updated 2 months ago
- [ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View (Spotlight)☆49Updated 4 months ago
- Official repo for EMNLP'24 paper "SOUL: Unlocking the Power of Second-Order Optimization for LLM Unlearning"☆18Updated 4 months ago
- [ICLR 2025] Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates (Oral)☆69Updated 3 months ago
- [ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning☆89Updated 8 months ago
- [ACL 2024] Code and data for "Machine Unlearning of Pre-trained Large Language Models"☆52Updated 4 months ago
- ☆27Updated 3 months ago
- [ICML 2024] Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications☆69Updated 4 months ago
- Official Code and data for ACL 2024 finding, "An Empirical Study on Parameter-Efficient Fine-Tuning for MultiModal Large Language Models"☆15Updated 3 months ago
- Official Repository for The Paper: Safety Alignment Should Be Made More Than Just a Few Tokens Deep☆71Updated 7 months ago
- Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding (EMNLP 2023 Long)☆56Updated 4 months ago
- Code and data for paper "A Semantic Invariant Robust Watermark for Large Language Models" accepted by ICLR 2024.☆27Updated 3 months ago
- ☆20Updated 7 months ago
- The repository of the paper "REEF: Representation Encoding Fingerprints for Large Language Models," aims to protect the IP of open-source…☆38Updated last month
- SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors☆43Updated 7 months ago
- This is the official code for the paper "Vaccine: Perturbation-aware Alignment for Large Language Models" (NeurIPS2024)☆35Updated 3 months ago
- ☆37Updated last year
- ☆17Updated 2 months ago
- ☆45Updated 7 months ago
- Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks☆23Updated 7 months ago
- ☆40Updated last year
- [ICLR'25 Spotlight] Min-K%++: Improved baseline for detecting pre-training data of LLMs☆35Updated last week
- EMNLP 2024: Model Editing Harms General Abilities of Large Language Models: Regularization to the Rescue☆35Updated 3 months ago
- code for ACL24 "MELoRA: Mini-Ensemble Low-Rank Adapter for Parameter-Efficient Fine-Tuning"☆16Updated this week
- Source code for the TMLR paper "Black-Box Prompt Learning for Pre-trained Language Models"☆55Updated last year
- ☆14Updated 4 months ago