John-AI-Lab / Unnatural_Language
The official repository of 'Unnatural Language Are Not Bugs but Features for LLMs'
☆15Updated last month
Alternatives and similar repositories for Unnatural_Language:
Users that are interested in Unnatural_Language are comparing it to the libraries listed below
- ☆15Updated last week
- EMNLP 2024: Model Editing Harms General Abilities of Large Language Models: Regularization to the Rescue☆35Updated 5 months ago
- ☆20Updated last month
- ☆27Updated 10 months ago
- [ICLR 2025] Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates (Oral)☆77Updated 6 months ago
- Official implementation of Bootstrapping Language Models via DPO Implicit Rewards☆43Updated last week
- Directional Preference Alignment☆57Updated 7 months ago
- ☆34Updated 6 months ago
- Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses (NeurIPS 2024)☆60Updated 3 months ago
- Official code for SEAL: Steerable Reasoning Calibration of Large Language Models for Free☆15Updated 2 weeks ago
- Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning [ICML 2024]☆17Updated 11 months ago
- [ICLR 2025] Code&Data for the paper "Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization"☆13Updated 10 months ago
- ☆21Updated 9 months ago
- This is an official implementation of the Reward rAnked Fine-Tuning Algorithm (RAFT), also known as iterative best-of-n fine-tuning or re…☆29Updated 7 months ago
- Codebase for decoding compressed trust.☆23Updated 11 months ago
- Code for ICLR 2025 Paper "What is Wrong with Perplexity for Long-context Language Modeling?"☆53Updated 3 weeks ago
- ☆11Updated last week
- Code for LLM_Catastrophic_Forgetting via SAM.☆10Updated 10 months ago
- ☆13Updated last year
- V1: Toward Multimodal Reasoning by Designing Auxiliary Task☆33Updated last week
- Code for "A Sober Look at Progress in Language Model Reasoning" paper☆36Updated last week
- ☆14Updated 6 months ago
- ConceptVectors Benchmark and Code for the paper "Intrinsic Evaluation of Unlearning Using Parametric Knowledge Traces"☆35Updated 2 months ago
- An official implementation of "Catastrophic Failure of LLM Unlearning via Quantization" (ICLR 2025)☆26Updated 2 months ago
- The code of “Improving Weak-to-Strong Generalization with Scalable Oversight and Ensemble Learning”☆16Updated last year
- [ECCV 2024] Official PyTorch Implementation of "How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs"☆80Updated last year
- Source code for the TMLR paper "Black-Box Prompt Learning for Pre-trained Language Models"☆55Updated last year
- The official implementation of "LightTransfer: Your Long-Context LLM is Secretly a Hybrid Model with Effortless Adaptation"☆19Updated this week
- ☆21Updated last month
- Restore safety in fine-tuned language models through task arithmetic☆28Updated last year