louieworth / awesome-rlhfView external linksLinks
An index of algorithms for reinforcement learning from human feedback (rlhf))
☆92Apr 17, 2024Updated last year
Alternatives and similar repositories for awesome-rlhf
Users that are interested in awesome-rlhf are comparing it to the libraries listed below
Sorting:
- Code for Paper (Policy Optimization in RLHF: The Impact of Out-of-preference Data)☆28Dec 19, 2023Updated 2 years ago
- Code to reproduce the experiments in The Mirage of Action-Dependent Baselines in Reinforcement Learning.☆17Aug 2, 2018Updated 7 years ago
- OpenLLMDE: An open source data engineering framework for LLMs☆18Sep 9, 2023Updated 2 years ago
- [ICML 2022] The official implementation of DWBC in "Discriminator-Weighted Offline Imitation Learning from Suboptimal Demonstrations"☆35Jan 5, 2023Updated 3 years ago
- Lipschitz Lifelong RL☆11Nov 6, 2020Updated 5 years ago
- Cross-domain word representation learning☆10May 23, 2015Updated 10 years ago
- ☆25Apr 24, 2019Updated 6 years ago
- ☆27Mar 13, 2024Updated last year
- A recipe for online RLHF and online iterative DPO.☆539Dec 28, 2024Updated last year
- This repo support auto line plot for multi-seed event file from TensorBoard☆12Jun 23, 2022Updated 3 years ago
- This is the pytorch implementation of the UAI2023 paper "A Trajectory is Worth Three Sentences: Multimodal Transformer for Offline Reinf…☆11Oct 9, 2023Updated 2 years ago
- Unofficial PyTorch Implementation of StarGAN-ZSVC☆14Aug 5, 2021Updated 4 years ago
- Facebear's minimal implementation of SBAC (Soft behavior regularized actor critic, NIPS22 offline RL workshop)☆12Jul 4, 2022Updated 3 years ago
- Implementation of the Playground environment from the paper Language as a Cognitive Tool to Imagine Goals inCuriosity-Driven Exploration.☆11Mar 5, 2021Updated 4 years ago
- Directional Preference Alignment☆58Sep 23, 2024Updated last year
- [WIP🚧] 2025 up-to-date list of resources on visual tokenizers (primarily for visual generation). Give it a star 🌟 if you find it useful…☆20Jan 5, 2025Updated last year
- This is the oficial repository for "Safer-Instruct: Aligning Language Models with Automated Preference Data"☆17Feb 22, 2024Updated last year
- A curated list of reinforcement learning with human feedback resources (continually updated)☆4,296Dec 9, 2025Updated 2 months ago
- RewardBench: the first evaluation tool for reward models.☆687Jan 31, 2026Updated 2 weeks ago
- Code for the paper Task Agnostic Morphology Evolution.☆20May 25, 2021Updated 4 years ago
- EMPHASIS: An Emotional Phoneme-based Acoustic Model for Speech Synthesis System☆15Mar 31, 2019Updated 6 years ago
- Experiment for Understanding the Effects of Dataset Characteristics on Offline Reinforcement Learning☆26Jan 16, 2023Updated 3 years ago
- Converts Mandarin Chinese pinyin notation to IPA (international phonetic alphabet) notation☆18Nov 28, 2023Updated 2 years ago
- ☆16Oct 5, 2021Updated 4 years ago
- A Python3 program for converting Japanese words and numbers into phonemes.☆18Apr 24, 2018Updated 7 years ago
- Code for the paper Novelty Search in Representational Space for Sample Efficient Exploration presented at NeurIPS 2020.☆14Jul 16, 2024Updated last year
- Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"☆186May 25, 2025Updated 8 months ago
- Official repo for NeurIPS'24 paper "WAGLE: Strategic Weight Attribution for Effective and Modular Unlearning in Large Language Models"☆18Dec 16, 2024Updated last year
- This is the source code of RPG (Reward-Randomized Policy Gradient)☆42Sep 1, 2022Updated 3 years ago
- Code for Contrastive Preference Learning (CPL)☆178Nov 22, 2024Updated last year
- ☆15Oct 20, 2020Updated 5 years ago
- The agent tying together the components of Project Happy Meal☆21Apr 21, 2019Updated 6 years ago
- [NeurIPS 2024 Oral] Aligner: Efficient Alignment by Learning to Correct☆191Jan 16, 2025Updated last year
- Self-Supervised Alignment with Mutual Information☆20May 24, 2024Updated last year
- WAFR 2024: Multi-modal variational inference in multi-agent interaction enabled by VAE + differentiable Nash game solver.☆24Nov 10, 2025Updated 3 months ago
- ☆23Oct 20, 2023Updated 2 years ago
- [ICLR 2023 Oral] The official implementation of SQL and EQL in "Offline RL with No OOD Actions: In-Sample Learning via Implicit Value Reg…☆47Jul 27, 2023Updated 2 years ago
- Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement Learning Method for Aligning Large Language Models)☆199Dec 16, 2023Updated 2 years ago
- TensorFlow implementation for our paper "Learning Long-Term Reward Redistribution via Randomized Return Decomposition"☆19Mar 17, 2022Updated 3 years ago