tml-epfl / long-is-more-for-alignment
Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning [ICML 2024]
☆12Updated 4 months ago
Related projects: ⓘ
- Official Repository for The Paper: Safety Alignment Should Be Made More Than Just a Few Tokens Deep☆22Updated 2 months ago
- ☆47Updated last year
- ☆24Updated 4 months ago
- Code for "Universal Adversarial Triggers Are Not Universal."☆15Updated 4 months ago
- ☆21Updated 2 months ago
- [ACL 2024] Code and data for "Machine Unlearning of Pre-trained Large Language Models"☆34Updated 4 months ago
- A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.☆46Updated last month
- [NeurIPS'23] Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptors☆64Updated 6 months ago
- ☆12Updated 3 months ago
- ☆61Updated 2 years ago
- Codebase for decoding compressed trust.☆20Updated 4 months ago
- A Kernel-Based View of Language Model Fine-Tuning https://arxiv.org/abs/2210.05643☆68Updated last year
- Is In-Context Learning Sufficient for Instruction Following in LLMs?☆19Updated 3 months ago
- Test-time-training on nearest neighbors for large language models☆22Updated 5 months ago
- Official Repository for Dataset Inference for LLMs☆21Updated last month
- Restore safety in fine-tuned language models through task arithmetic☆25Updated 5 months ago
- ☆23Updated 4 months ago
- [ICML 2023] "Robust Weight Signatures: Gaining Robustness as Easy as Patching Weights?" by Ruisi Cai, Zhenyu Zhang, Zhangyang Wang☆16Updated last year
- [ICML 2024] Junk DNA Hypothesis: A Task-Centric Angle of LLM Pre-trained Weights through Sparsity; Lu Yin*, Ajay Jaiswal*, Shiwei Liu, So…☆14Updated 3 months ago
- Model Editing Can Hurt General Abilities of Large Language Models☆29Updated 7 months ago
- ☆38Updated 8 months ago
- Official repository of "Localizing Task Information for Improved Model Merging and Compression" [ICML 2024]☆27Updated 3 months ago
- Code for NeurIPS'23 paper "A Bayesian Approach To Analysing Training Data Attribution In Deep Learning"☆14Updated 8 months ago
- ☆22Updated 2 months ago
- ☆21Updated this week
- Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses (NextGenAISafety @ ICML 2024)☆37Updated last month
- DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion Models (ICLR 2024)☆48Updated 5 months ago
- Min-K%++: Improved baseline for detecting pre-training data of LLMs https://arxiv.org/abs/2404.02936☆25Updated 3 months ago
- This is the official code for the paper "Vaccine: Perturbation-aware Alignment for Large Language Models"☆12Updated last week
- ☆32Updated 10 months ago