Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging
☆118Oct 23, 2023Updated 2 years ago
Alternatives and similar repositories for RLPHF
Users that are interested in RLPHF are comparing it to the libraries listed below
Sorting:
- ☆11Sep 19, 2025Updated 5 months ago
- ☆20Jan 15, 2024Updated 2 years ago
- Bridging Retrieval and Inference through Evidence Fusion☆12Oct 20, 2025Updated 4 months ago
- Official code and dataset repository of KoBBQ (TACL 2024)☆19May 13, 2024Updated last year
- Code and data for the paper "Understanding Hidden Context in Preference Learning: Consequences for RLHF"☆33Dec 14, 2023Updated 2 years ago
- ☆282Jan 6, 2025Updated last year
- Rewarded soups official implementation☆62Sep 27, 2023Updated 2 years ago
- [NAACL 2024] Vision language model that reduces hallucinations through self-feedback guided revision. Visualizes attentions on image feat…☆47Aug 21, 2024Updated last year
- Directional Preference Alignment☆58Sep 23, 2024Updated last year
- personalized-llms with allen institute☆14Jun 22, 2023Updated 2 years ago
- Code for "Preference Tuning For Toxicity Mitigation Generalizes Across Languages." Paper accepted at Findings of EMNLP 2024☆18Mar 25, 2025Updated 11 months ago
- ☆30Feb 16, 2024Updated 2 years ago
- ☆19Oct 2, 2023Updated 2 years ago
- Code for "Goodtriever: Toxicity Mitigation with Retrieval-augmented Language Models"☆25May 30, 2024Updated last year
- PRODIGy is a collection of dialogues in which each conversation is aligned with speaker profile representations.☆19Jan 8, 2025Updated last year
- The Prism Alignment Project☆90Apr 25, 2024Updated last year
- RewardBench: the first evaluation tool for reward models.☆697Feb 16, 2026Updated 2 weeks ago
- ☆27Dec 9, 2024Updated last year
- ☆24Mar 4, 2024Updated 2 years ago
- A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.☆85Mar 7, 2025Updated 11 months ago
- Self-Supervised Alignment with Mutual Information☆20May 24, 2024Updated last year
- Framework for controlling demographic biases in NLG (using adversarial prompts)☆20Jun 12, 2023Updated 2 years ago
- ☆32Aug 9, 2024Updated last year
- [COLING 2022] Mind the Gap! Injecting Commonsense Knowledge for Abstractive Dialogue Summarization☆25Mar 28, 2024Updated last year
- [ICLR 2024 & NeurIPS 2023 WS] An Evaluator LM that is open-source, offers reproducible evaluation, and inexpensive to use. Specifically d…☆311Nov 11, 2023Updated 2 years ago
- Recipes to train reward model for RLHF.☆1,517Apr 24, 2025Updated 10 months ago
- Modular Pluralism @ EMNLP 2024☆23Sep 20, 2024Updated last year
- [ACL 2023] Gradient Ascent Post-training Enhances Language Model Generalization☆29Sep 12, 2024Updated last year
- The offical code for paper "What Constitutes a Faithful Summary? Preserving Author Perspectives in News Summarization"☆10Jun 23, 2024Updated last year
- [TACL 2024] Improving Probability-based Prompt Selection Through Unified Evaluation and Analysis☆11Nov 14, 2024Updated last year
- ☆10Jan 20, 2024Updated 2 years ago
- ☆12Apr 24, 2024Updated last year
- The Shifted and The Overlooked: A Task-oriented Investigation of User-GPT Interactions (EMNLP 2023))☆13Dec 21, 2023Updated 2 years ago
- 3rd placed submission to the NeurIPS MineRL competition 2019☆10Mar 24, 2023Updated 2 years ago
- A recipe for online RLHF and online iterative DPO.☆540Dec 28, 2024Updated last year
- Augmenting Statistical Models with Natural Language Parameters☆29Sep 17, 2024Updated last year
- ☆23Mar 8, 2024Updated last year
- This repository contains the official code for the paper: "Prompt Injection: Parameterization of Fixed Inputs"☆32Sep 13, 2024Updated last year
- Logic grid puzzle ("zebra puzzle") generator and solver☆30Mar 1, 2024Updated 2 years ago