declare-lab / safety-arithmetic
☆12Updated 2 months ago
Alternatives and similar repositories for safety-arithmetic:
Users that are interested in safety-arithmetic are comparing it to the libraries listed below
- Code associated with Tuning Language Models by Proxy (Liu et al., 2024)☆107Updated last year
- DELLA-Merging: Reducing Interference in Model Merging through Magnitude-Based Sampling☆29Updated 8 months ago
- EMNLP 2024: Model Editing Harms General Abilities of Large Language Models: Regularization to the Rescue☆35Updated 4 months ago
- Official repository for paper "Weak-to-Strong Extrapolation Expedites Alignment"☆73Updated 10 months ago
- The rule-based evaluation subset and code implementation of Omni-MATH☆18Updated 3 months ago
- Official implementation of the paper "From Complex to Simple: Enhancing Multi-Constraint Complex Instruction Following Ability of Large L…☆46Updated 9 months ago
- Code & Data for our Paper "Alleviating Hallucinations of Large Language Models through Induced Hallucinations"☆63Updated last year
- Official code implementation of SKU, Accepted by ACL 2024 Findings☆13Updated 3 months ago
- [WWW2024 Oral] Harnessing Multi-Role Capabilities of Large Language Models for Open-Domain Question Answering☆9Updated last month
- ☆3Updated 2 months ago
- [ACL 2024] Code for "MoPS: Modular Story Premise Synthesis for Open-Ended Automatic Story Generation"☆35Updated 8 months ago
- Our research proposes a novel MoGU framework that improves LLMs' safety while preserving their usability.☆14Updated 2 months ago
- SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language Model https://arxiv.org/pdf/2411.02433☆24Updated 4 months ago
- BeHonest: Benchmarking Honesty in Large Language Models☆31Updated 7 months ago
- Restore safety in fine-tuned language models through task arithmetic☆28Updated last year
- [EMNLP 2024] The official GitHub repo for the paper "Course-Correction: Safety Alignment Using Synthetic Preferences"☆19Updated 6 months ago
- This is for EMNLP 2024 Paper: AppBench: Planning of Multiple APIs from Various APPs for Complex User Instruction☆11Updated 5 months ago
- Knowledge Unlearning for Large Language Models☆25Updated last week
- ☆16Updated 4 months ago
- ☆17Updated 5 months ago
- ☆59Updated 7 months ago
- ☆29Updated 3 months ago
- [ACL'24] WebCiteS: Attributed Query-Focused Summarization on Chinese Web Search Results with Citations☆12Updated 6 months ago
- [Preprint] An inference-time decoding strategy with adaptive foresight sampling☆88Updated 2 weeks ago
- Code and Data for the paper "Evaluating Character Understanding of Large Language Models via Character Profiling from Fictional Works".☆16Updated 8 months ago
- [arxiv:2412.04905] DEMO: Reframing Dialogue Interaction with Fine-grained Element Modeling☆13Updated 3 months ago
- Codebase for Math Neurosurgery: Isolating LLMs' Math Reasoning Abilities Using Only Forward Passes☆12Updated last month
- Codebase for Instruction Following without Instruction Tuning☆34Updated 6 months ago
- LongHeads: Multi-Head Attention is Secretly a Long Context Processor☆29Updated last year
- 🍼 Official implementation of Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts☆38Updated 6 months ago