declare-lab / safety-arithmetic
☆10Updated 5 months ago
Related projects ⓘ
Alternatives and complementary repositories for safety-arithmetic
- Implementation of PaCE: Parsimonious Concept Engineering for Large Language Models (NeurIPS 2024)☆26Updated 2 weeks ago
- EMNLP 2024: Model Editing Harms General Abilities of Large Language Models: Regularization to the Rescue☆33Updated this week
- MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation☆17Updated 2 weeks ago
- DELLA-Merging: Reducing Interference in Model Merging through Magnitude-Based Sampling☆28Updated 4 months ago
- Codebase for Instruction Following without Instruction Tuning☆32Updated last month
- ☆8Updated 6 months ago
- Code for "Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective"☆30Updated 6 months ago
- InstructRAG: Instructing Retrieval-Augmented Generation via Self-Synthesized Rationales☆54Updated last week
- Data and code for our paper "Why Does the Effective Context Length of LLMs Fall Short?"☆64Updated last week
- Official repository for paper "Weak-to-Strong Extrapolation Expedites Alignment"☆68Updated 5 months ago
- Code associated with Tuning Language Models by Proxy (Liu et al., 2024)☆97Updated 7 months ago
- ☆29Updated last year
- Sotopia-π: Interactive Learning of Socially Intelligent Language Agents (ACL 2024)☆50Updated 6 months ago
- Official Code Repository for LM-Steer Paper: "Word Embeddings Are Steers for Language Models" (ACL 2024 Outstanding Paper Award)☆62Updated last month
- Flow of Reasoning: Training LLMs for Divergent Problem Solving with Minimal Examples☆39Updated last month
- PyTorch implementation of StableMask (ICML'24)☆12Updated 4 months ago
- [EMNLP 2024 Findings] To Forget or Not? Towards Practical Knowledge Unlearning for Large Language Models☆19Updated last week
- [ICLR 2024] This is the repository for the paper titled "DePT: Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning"☆94Updated 7 months ago
- Official implementation of Bootstrapping Language Models via DPO Implicit Rewards☆39Updated 3 months ago
- AbstainQA, ACL 2024☆19Updated last month
- [ICLR 2024] Unveiling the Pitfalls of Knowledge Editing for Large Language Models☆22Updated 5 months ago
- [NeurIPS 2024 Main Track] Code for the paper titled "Instruction Tuning With Loss Over Instructions"☆26Updated 5 months ago
- [NeurIPS 2024 Spotlight] Code and data for the paper "Finding Transformer Circuits with Edge Pruning".☆24Updated 3 weeks ago
- [ACL 2024] Code for "MoPS: Modular Story Premise Synthesis for Open-Ended Automatic Story Generation"☆31Updated 4 months ago
- In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation (ICML 2024)☆45Updated 7 months ago
- Restore safety in fine-tuned language models through task arithmetic☆26Updated 7 months ago
- [NeurIPS 2024] Train LLMs with diverse system messages reflecting individualized preferences to generalize to unseen system messages☆37Updated last month
- ☆20Updated last year
- [ACL 2024] Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning☆30Updated 3 months ago
- Script for processing OpenAI's PRM800K process supervision dataset into an Alpaca-style instruction-response format☆27Updated last year