Code for Contrastive Preference Learning (CPL)
☆179Nov 22, 2024Updated last year
Alternatives and similar repositories for cpl
Users that are interested in cpl are comparing it to the libraries listed below
Sorting:
- ☆43May 25, 2023Updated 2 years ago
- A lightweight research framework☆28Oct 14, 2025Updated 4 months ago
- Official Codebase for TMLR 2023, Benchmarks and Algorithms for Offline Preference-Based Reward Learning☆20Dec 30, 2022Updated 3 years ago
- Companion code to CoRL 2018 paper: E Bıyık, D Sadigh. "Batch Active Preference-Based Learning of Reward Functions". Conference on Robot L…☆30May 29, 2019Updated 6 years ago
- Official Implementation of NeurIPS'23 Paper "Cross-Episodic Curriculum for Transformer Agents"☆31Oct 12, 2023Updated 2 years ago
- Official codebase for "B-Pref: Benchmarking Preference-BasedReinforcement Learning" contains scripts to reproduce experiments.☆133Nov 3, 2021Updated 4 years ago
- Offline RLHF codebase implementation for "Uni-RLHF: Universal Platform and Benchmark Suite for Reinforcement Learning with Diverse Human …☆41Mar 26, 2024Updated last year
- ☆282Jan 6, 2025Updated last year
- Self-Supervised Alignment with Mutual Information☆20May 24, 2024Updated last year
- Predictable MDP Abstraction for Unsupervised Model-Based RL (ICML 2023)☆32Feb 6, 2023Updated 3 years ago
- ☆37Apr 27, 2023Updated 2 years ago
- Preference Transformer: Modeling Human Preferences using Transformers for RL (ICLR2023 Accepted)☆167Oct 15, 2023Updated 2 years ago
- [ICLR 2025] Weighted-Reward Preference Optimization for Implicit Model Fusion☆14Mar 17, 2025Updated 11 months ago
- Reference implementation for Token-level Direct Preference Optimization(TDPO)☆151Feb 14, 2025Updated last year
- Implementation of ICML 2023 paper: Future-conditioned Unsupervised Pretraining for Decision Transformer☆29Jul 25, 2023Updated 2 years ago
- Script for processing OpenAI's PRM800K process supervision dataset into an Alpaca-style instruction-response format☆27Jul 12, 2023Updated 2 years ago
- Listwise Reward Estimation for Offline Preference-based Reinforcement Learning (ICML 2024)☆17Jun 18, 2024Updated last year
- Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"☆186May 25, 2025Updated 9 months ago
- Directional Preference Alignment☆58Sep 23, 2024Updated last year
- ☆15Aug 9, 2021Updated 4 years ago
- Authors' PyTorch implementation of 'Recomposing the Reinforcement Learning Building-Blocks with Hypernetworks' (HypeRL)☆26Jun 9, 2021Updated 4 years ago
- Code accompanying the paper Pretraining Language Models with Human Preferences☆180Feb 13, 2024Updated 2 years ago
- HIQL: Offline Goal-Conditioned RL with Latent States as Actions (NeurIPS 2023)☆93Dec 1, 2024Updated last year
- Source code for the paper "Policy Architectures for Compositional Generalization in Control"☆30May 19, 2022Updated 3 years ago
- ☆27Mar 13, 2024Updated last year
- CodeUltraFeedback: aligning large language models to coding preferences (TOSEM 2025)☆73Jun 25, 2024Updated last year
- Extreme Q-Learning: Max Entropy RL without Entropy☆87Feb 14, 2023Updated 3 years ago
- [ICLR 2022] Official implementation of paper: Efficient Learning of Safe Driving Policy via Human-AI Copilot Optimization☆54Dec 23, 2022Updated 3 years ago
- Official implementation of "Direct Preference-based Policy Optimization without Reward Modeling" (NeurIPS 2023)☆42Jul 20, 2024Updated last year
- MTM Masked Trajectory Models for Prediction, Representation, and Control.☆162Dec 16, 2025Updated 2 months ago
- Code for Paper (Policy Optimization in RLHF: The Impact of Out-of-preference Data)☆28Dec 19, 2023Updated 2 years ago
- Plan✕ is a platform for creating and publishing digital planning services☆17Feb 26, 2026Updated last week
- ☆42Mar 19, 2021Updated 4 years ago
- Experiments to assess SPADE on different LLM pipelines.☆17Apr 7, 2024Updated last year
- Representation Learning in RL☆13Jun 1, 2022Updated 3 years ago
- Reinforcement Learning via Regressing Relative Rewards☆39Dec 12, 2024Updated last year
- Benchmarking RL generalization in an interpretable way.☆175Nov 20, 2025Updated 3 months ago
- ☆35Jan 29, 2023Updated 3 years ago
- Official implementation of TBA for async LLM post-training.☆29Nov 5, 2025Updated 3 months ago