CarperAI / Algorithm-Distillation-RLHF
☆34Updated 2 years ago
Alternatives and similar repositories for Algorithm-Distillation-RLHF:
Users that are interested in Algorithm-Distillation-RLHF are comparing it to the libraries listed below
- Advantage Leftover Lunch Reinforcement Learning (A-LoL RL): Improving Language Models with Advantage-based Offline Policy Gradients☆26Updated 6 months ago
- ☆82Updated 8 months ago
- CleanRL's implementation of DeepMind's Podracer Sebulba Architecture for Distributed DRL☆109Updated 7 months ago
- Q-Probe: A Lightweight Approach to Reward Maximization for Language Models☆41Updated 9 months ago
- ☆20Updated 9 months ago
- Scalable Opponent Shaping Experiments in JAX☆24Updated 11 months ago
- Interpreting how transformers simulate agents performing RL tasks☆78Updated last year
- Scaling scaling laws with board games.☆48Updated last year
- NeurIPS 2024 tutorial on LLM Inference☆39Updated 3 months ago
- Repo to reproduce the First-Explore paper results☆37Updated 3 months ago
- ☆13Updated 8 months ago
- Official code for the paper "Context-Aware Language Modeling for Goal-Oriented Dialogue Systems"☆34Updated 2 years ago
- [ICML 2024] Official code release accompanying the paper "diff History for Neural Language Agents" (Piterbarg, Pinto, Fergus)☆20Updated 7 months ago
- Code for Discovered Policy Optimisation (NeurIPS 2022)☆9Updated last year
- ☆16Updated last year
- Code for "Unsupervised Zero-Shot RL via Functional Reward Representations"☆53Updated 11 months ago
- ☆31Updated 3 months ago
- Implements the Messenger environment and EMMA model.☆23Updated last year
- Official PyTorch implementation of "Discovering Hierarchical Achievements in Reinforcement Learning via Contrastive Learning" (NeurIPS 20…☆32Updated last month
- Learn online intrinsic rewards from LLM feedback☆35Updated 3 months ago
- [AutoML'22] Bayesian Generational Population-based Training (BG-PBT)☆28Updated 2 years ago
- Official code for "Can Wikipedia Help Offline Reinforcement Learning?" by Machel Reid, Yutaro Yamada and Shixiang Shane Gu☆104Updated 2 years ago
- ☆56Updated 2 years ago
- ☆14Updated 11 months ago
- Intrinsic Motivation from Artificial Intelligence Feedback☆128Updated last year
- Evaluating long-term memory of reinforcement learning algorithms☆141Updated last year
- ☆31Updated 2 years ago
- ☆53Updated 4 months ago
- ☆25Updated 11 months ago
- ☆74Updated this week