ZhaolinGao / REBEL
☆28Updated 2 months ago
Alternatives and similar repositories for REBEL:
Users that are interested in REBEL are comparing it to the libraries listed below
- Official implementation of "Direct Preference-based Policy Optimization without Reward Modeling" (NeurIPS 2023)☆41Updated 7 months ago
- Advantage Leftover Lunch Reinforcement Learning (A-LoL RL): Improving Language Models with Advantage-based Offline Policy Gradients☆26Updated 5 months ago
- ☆79Updated 7 months ago
- Dateset Reset Policy Optimization☆30Updated 10 months ago
- ☆25Updated 10 months ago
- VC-FB and MC-FB algorithms from "Zero-Shot Reinforcement Learning from Low Quality Data" (NeurIPS 2024)☆13Updated last month
- ☆15Updated last year
- PyTorch Package For Quasimetric Learning☆41Updated 3 months ago
- ☆28Updated 3 months ago
- ☆26Updated last year
- Learning to Modulate pre-trained Models in RL (Decision Transformer, LoRA, Fine-tuning)☆54Updated 4 months ago
- Code and data for the paper "Understanding Hidden Context in Preference Learning: Consequences for RLHF"☆28Updated last year
- A repo for RLHF training and BoN over LLMs, with support for reward model ensembles.☆35Updated last month
- Official PyTorch Implementation of the Longhorn Deep State Space Model☆48Updated 2 months ago
- ☆13Updated 3 months ago
- Codebase for "Uni[MASK]: Unified Inference in Sequential Decision Problems"☆54Updated 7 months ago
- Learn online intrinsic rewards from LLM feedback☆34Updated 2 months ago
- Official Code Repository for EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents (COLM 2024)☆27Updated 7 months ago
- Rewarded soups official implementation☆55Updated last year
- This is code for most of the experiments in the paper Understanding the Effects of RLHF on LLM Generalisation and Diversity☆40Updated last year
- [ICML 2024] Official code release accompanying the paper "diff History for Neural Language Agents" (Piterbarg, Pinto, Fergus)☆20Updated 6 months ago
- ICML 2022: Learning Iterative Reasoning through Energy Minimization☆46Updated last year
- We develop world models that can be adapted with natural language. Intergrating these models into artificial agents allows humans to effe…☆21Updated last year
- JAX implementation of VQVAE/VQGAN autoencoders (+FSQ)☆23Updated 8 months ago
- Code for Paper (Policy Optimization in RLHF: The Impact of Out-of-preference Data)☆26Updated last year
- ☆47Updated last week
- BASALT Benchmark datasets, evaluation code and agent training example.☆20Updated last year
- [ICLR 2025] Code for the paper "Beyond Autoregression: Discrete Diffusion for Complex Reasoning and Planning"☆32Updated last week
- Implements the Messenger environment and EMMA model.☆23Updated last year
- Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"☆25Updated 10 months ago