Code and data for the paper "Understanding Hidden Context in Preference Learning: Consequences for RLHF"
☆33Dec 14, 2023Updated 2 years ago
Alternatives and similar repositories for hidden-context
Users that are interested in hidden-context are comparing it to the libraries listed below
Sorting:
- ☆21Dec 17, 2020Updated 5 years ago
- Infer how suboptimal agents are suboptimal while planning, for example if they are hyperbolic time discounters.☆25Sep 26, 2020Updated 5 years ago
- ☆11Mar 13, 2023Updated 2 years ago
- Code for EMNLP'24 paper - On Diversified Preferences of Large Language Model Alignment☆16Aug 6, 2024Updated last year
- Reproduction of OpenAI and DeepMind's "Deep Reinforcement Learning from Human Preferences"☆31Jul 27, 2021Updated 4 years ago
- This repo support auto line plot for multi-seed event file from TensorBoard☆12Jun 23, 2022Updated 3 years ago
- Baselines for Neural MMO -- new users should treat this repo as a starter project☆51Jul 29, 2024Updated last year
- Rewarded soups official implementation☆62Sep 27, 2023Updated 2 years ago
- PyTorch implementation of "The Option Keyboard: Combining Skills in Reinforcement Learning" (NeurIPS 2019)☆12Jul 2, 2020Updated 5 years ago
- code to reproduce the empirical results in the research paper☆38Oct 12, 2021Updated 4 years ago
- Code for our NeurIPS 2020 paper Improving Generalization in Reinforcement Learning with Mixture Regularization☆35Oct 22, 2020Updated 5 years ago
- Code to reproduce the experiments in The Mirage of Action-Dependent Baselines in Reinforcement Learning.☆17Aug 2, 2018Updated 7 years ago
- Code for the paper Novelty Search in Representational Space for Sample Efficient Exploration presented at NeurIPS 2020.☆14Jul 16, 2024Updated last year
- Framework for writing bots that play Hanabi.☆37May 16, 2019Updated 6 years ago
- Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging☆118Oct 23, 2023Updated 2 years ago
- This is the source code of RPG (Reward-Randomized Policy Gradient)☆42Sep 1, 2022Updated 3 years ago
- OpenLLMDE: An open source data engineering framework for LLMs☆18Sep 9, 2023Updated 2 years ago
- Code for Model-Free Opponent Shaping (ICML 2022)☆20Nov 18, 2022Updated 3 years ago
- Implementation of the Model-Based Meta-Policy-Optimization (MB-MPO) algorithm☆44Nov 15, 2018Updated 7 years ago
- Code and data for the paper "Bridging RL Theory and Practice with the Effective Horizon"☆51Jun 26, 2024Updated last year
- ☆20Apr 10, 2018Updated 7 years ago
- Implementing REINFORCE algorithm on Pong, Lunar Lander and Cartplot + Medium Article☆23Nov 24, 2020Updated 5 years ago
- Author's PyTorch implementation of SR-DICE for marginalized importance sampling☆28Dec 7, 2021Updated 4 years ago
- TensorFlow implementation for our paper "Learning Long-Term Reward Redistribution via Randomized Return Decomposition"☆19Mar 17, 2022Updated 3 years ago
- ☆18Apr 17, 2019Updated 6 years ago
- Source code of "Variational Imitation Learning with Diverse-quality Demonstrations" in ICML 2020. This github repository includes python …☆20Aug 16, 2021Updated 4 years ago
- (NeurIPS '22) LISA: Learning Interpretable Skill Abstractions - A framework for unsupervised skill learning using Imitation☆29Feb 22, 2023Updated 3 years ago
- Cross-Domain Imitation Learning via Optimal Transport☆25Jun 24, 2022Updated 3 years ago
- Official repository for paper "Conservative Offline Distributional Reinforcement Learning" (NeurIPS 2021)☆22Aug 1, 2021Updated 4 years ago
- Public Release of Plan2vec Implementation in pyTorch☆57Oct 28, 2022Updated 3 years ago
- (Experimental) Inverse reinforcement learning from trajectories generated by multiple agents with different (but correlated) rewards☆28Jun 20, 2019Updated 6 years ago
- Directional Preference Alignment☆58Sep 23, 2024Updated last year
- Code base for paper: Reparameterized Policy Learning for Multimodal Trajectory Optimization☆27Jul 19, 2023Updated 2 years ago
- Dateset Reset Policy Optimization☆31Apr 12, 2024Updated last year
- The Arcade Learning Environment (ALE) -- a platform for AI research.☆24Sep 18, 2024Updated last year
- Code for the ICML 2021 paper "Bridging Multi-Task Learning and Meta-Learning: Towards Efficient Training and Effective Adaptation", Haoxi…☆68Oct 18, 2021Updated 4 years ago
- CodeUltraFeedback: aligning large language models to coding preferences (TOSEM 2025)☆73Jun 25, 2024Updated last year
- Implementation of the skill discovery algorithm described in ICLR submission "Option Discovery using Deep Skill Chaining"☆30Sep 24, 2019Updated 6 years ago
- Rapprentice: software for teaching robots by example☆33Oct 1, 2013Updated 12 years ago