John-AI-Lab / Unnatural_Language
The official repository of 'Unnatural Language Are Not Bugs but Features for LLMs'
☆13Updated 3 weeks ago
Alternatives and similar repositories for Unnatural_Language:
Users that are interested in Unnatural_Language are comparing it to the libraries listed below
- The official implementation of "LightTransfer: Your Long-Context LLM is Secretly a Hybrid Model with Effortless Adaptation"☆13Updated last week
- [ICLR 2025] Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates (Oral)☆75Updated 5 months ago
- ☆20Updated 3 months ago
- Codebase for decoding compressed trust.☆23Updated 10 months ago
- Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses (NeurIPS 2024)☆59Updated 2 months ago
- ☆30Updated 3 months ago
- Intriguing Properties of Data Attribution on Diffusion Models (ICLR 2024)☆28Updated last year
- [ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning☆89Updated 10 months ago
- [NeurIPS 2024] Accelerating Greedy Coordinate Gradient and General Prompt Optimization via Probe Sampling☆26Updated 4 months ago
- EMNLP 2024: Model Editing Harms General Abilities of Large Language Models: Regularization to the Rescue☆35Updated 4 months ago
- Code for safety test in "Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates"☆18Updated last year
- ☆33Updated 5 months ago
- ☆25Updated 9 months ago
- Code for "Universal Adversarial Triggers Are Not Universal."☆16Updated 10 months ago
- Official implementation of Bootstrapping Language Models via DPO Implicit Rewards☆43Updated 8 months ago
- ☆20Updated 2 months ago
- [ICML 2024] Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications☆73Updated last month
- ☆17Updated 5 months ago
- Official Repository for The Paper: Safety Alignment Should Be Made More Than Just a Few Tokens Deep☆82Updated 8 months ago
- ☆21Updated 2 weeks ago
- ☆20Updated last year
- ☆19Updated 7 months ago
- An official implementation of "Catastrophic Failure of LLM Unlearning via Quantization" (ICLR 2025)☆26Updated last month
- ☆52Updated 8 months ago
- [ICLR'25 Spotlight] Min-K%++: Improved baseline for detecting pre-training data of LLMs☆36Updated last month
- ☆14Updated 5 months ago
- Is In-Context Learning Sufficient for Instruction Following in LLMs? [ICLR 2025]☆29Updated 2 months ago
- ☆37Updated last year
- Code for paper "Universal Jailbreak Backdoors from Poisoned Human Feedback"☆48Updated 11 months ago
- ☆13Updated last year