Sea-Snell / Implicit-Language-Q-LearningLinks

Official code from the paper "Offline RL for Natural Language Generation with Implicit Language Q Learning"

☆208

Alternatives and similar repositories for Implicit-Language-Q-Learning

Users that are interested in Implicit-Language-Q-Learning are comparing it to the libraries listed below

Sorting:

tomekkorbak / pretraining-with-human-feedback
Code accompanying the paper Pretraining Language Models with Human Preferences
☆182Updated last year
tianjunz / HIR
☆159Updated 2 years ago
abdulhaim / LMRL-Gym
☆99Updated last year
haoliuhl / chain-of-hindsight
Simple next-token-prediction for RLHF
☆227Updated last year
jhejna / cpl
Code for Contrastive Preference Learning (CPL)
☆174Updated 8 months ago
CarperAI / autocrit
A repository for transformer critique learning and generation
☆90Updated last year
microsoft / RLHF-APA
RL algorithm: Advantage induced policy alignment
☆65Updated last year
McGill-NLP / VinePPO
Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"
☆167Updated 2 months ago
NohTow / PPL-MCTS
Repository for the code of the "PPL-MCTS: Constrained Textual Generation Through Discriminator-Guided Decoding" paper, NAACL'22
☆66Updated 2 years ago
princeton-nlp / TransformerPrograms
[NeurIPS 2023] Learning Transformer Programs
☆162Updated last year
google-deepmind / emergent_in_context_learning
☆84Updated last year
flowersteam / Grounding_LLMs_with_online_RL
We perform functional grounding of LLMs' knowledge in BabyAI-Text
☆268Updated 11 months ago
vwxyzjn / lm-human-preference-details
RLHF implementation details of OAI's 2019 codebase
☆187Updated last year
Cornell-RL / tril
☆127Updated last year
minaek / reward_design_with_llms
☆220Updated 2 years ago
gabegrand / world-models
☆209Updated 2 years ago
thomfoster / minRLHF
A (somewhat) minimal library for finetuning language models with PPO on human feedback.
☆86Updated 2 years ago
Ber666 / RAP
Reasoning with Language Model is Planning with World Model
☆168Updated last year
anthropics / ConstitutionalHarmlessnessPaper
☆240Updated 2 years ago
Sea-Snell / CALM-Dialogue
Official code for the paper "Context-Aware Language Modeling for Goal-Oriented Dialogue Systems"
☆34Updated 2 years ago
microsoft / SmartPlay
SmartPlay is a benchmark for Large Language Models (LLMs). Uses a variety of games to test various important LLM capabilities as agents. …
☆140Updated last year
facebookresearch / motif
Intrinsic Motivation from Artificial Intelligence Feedback
☆130Updated last year
allenai / ScienceWorld
ScienceWorld is a text-based virtual environment centered around accomplishing tasks from the standardized elementary science curriculum.
☆280Updated 3 weeks ago
likenneth / othello_world
Emergent world representations: Exploring a sequence model trained on a synthetic task
☆186Updated 2 years ago
iglu-contest / gridworld
A reinforcement learning environment for the IGLU 2022 at NeurIPS
☆34Updated 2 years ago
Dahoas / reward-modeling
☆96Updated 2 years ago
aypan17 / machiavelli
☆137Updated 2 weeks ago
booydar / LM-RMT
Recurrent Memory Transformer
☆150Updated last year
flowersteam / lamorel
Lamorel is a Python library designed for RL practitioners eager to use Large Language Models (LLMs).
☆236Updated 9 months ago
machelreid / can-wikipedia-help-offline-rl
Official code for "Can Wikipedia Help Offline Reinforcement Learning?" by Machel Reid, Yutaro Yamada and Shixiang Shane Gu
☆105Updated 3 years ago