CarperAI / Algorithm-Distillation-RLHFLinks

☆34

Alternatives and similar repositories for Algorithm-Distillation-RLHF

Users that are interested in Algorithm-Distillation-RLHF are comparing it to the libraries listed below

Sorting:

vwxyzjn / cleanba
CleanRL's implementation of DeepMind's Podracer Sebulba Architecture for Distributed DRL
☆114Updated 11 months ago
Sea-Snell / Implicit-Language-Q-Learning
Official code from the paper "Offline RL for Natural Language Generation with Implicit Language Q Learning"
☆208Updated 2 years ago
facebookresearch / how-to-autorl
Plug-and-play hydra sweepers for the EA-based multifidelity method DEHB and several population-based training variations, all proven to e…
☆83Updated last year
google-deepmind / nao_top10
☆19Updated 2 years ago
ucl-dark / pax
Scalable Opponent Shaping Experiments in JAX
☆24Updated last year
machelreid / can-wikipedia-help-offline-rl
Official code for "Can Wikipedia Help Offline Reinforcement Learning?" by Machel Reid, Yutaro Yamada and Shixiang Shane Gu
☆105Updated 3 years ago
FLAIROx / cultural-accumulation
☆13Updated last year
ml-jku / helm
☆54Updated 9 months ago
iglu-contest / gridworld
A reinforcement learning environment for the IGLU 2022 at NeurIPS
☆34Updated 2 years ago
keraJLi / synthetic-gymnax
Drop-in environment replacements that make your RL algorithm train faster.
☆21Updated last year
jbloomAus / DecisionTransformerInterpretability
Interpreting how transformers simulate agents performing RL tasks
☆87Updated last year
luchris429 / discovered-policy-optimisation
Code for Discovered Policy Optimisation (NeurIPS 2022)
☆11Updated 2 years ago
Sea-Snell / CALM-Dialogue
Official code for the paper "Context-Aware Language Modeling for Goal-Oriented Dialogue Systems"
☆34Updated 2 years ago
iclr-blog-track / iclr-blog-track.github.io
☆45Updated last year
jcoreyes / evolvingrl
Supplementary Data for Evolving Reinforcement Learning Algorithms
☆46Updated 4 years ago
upiterbarg / diff_history
[ICML 2024] Official code release accompanying the paper "diff History for Neural Language Agents" (Piterbarg, Pinto, Fergus)
☆20Updated 11 months ago
abdulhaim / LMRL-Gym
☆99Updated last year
facebookresearch / motif
Intrinsic Motivation from Artificial Intelligence Feedback
☆130Updated last year
hr0nix / dejax
Accelerated replay buffers in JAX
☆43Updated 2 years ago
xingchenwan / bgpbt
[AutoML'22] Bayesian Generational Population-based Training (BG-PBT)
☆28Updated 2 years ago
prajjwal1 / rl_paradigm
☆17Updated last year
abaheti95 / LoL-RL
Advantage Leftover Lunch Reinforcement Learning (A-LoL RL): Improving Language Models with Advantage-based Offline Policy Gradients
☆26Updated 10 months ago
facebookresearch / oni
Learn online intrinsic rewards from LLM feedback
☆42Updated 7 months ago
google-deepmind / csuite
☆44Updated 10 months ago
ElisevanderPol / symmetrizer
☆31Updated 4 years ago
sven1977 / dreamer_v3
Implementation (TensorFlow/keras) of the DreamerV3 model-based RL algorithm by Hafner et al. 2023
☆3Updated 2 years ago
google-deepmind / enn_acme
☆31Updated 2 years ago
btnorman / First-Explore
Repo to reproduce the First-Explore paper results
☆38Updated 7 months ago
microsoft / greenlands
Platform to run interactive Reinforcement Learning agents in a Minecraft Server
☆53Updated last year
snu-mllab / DPPO
Official implementation of "Direct Preference-based Policy Optimization without Reward Modeling" (NeurIPS 2023)
☆42Updated last year