vmicheli / lm-butlersLinks

☆12

Alternatives and similar repositories for lm-butlers

Users that are interested in lm-butlers are comparing it to the libraries listed below

Sorting:

facebookresearch / rlfh-gen-div
This is code for most of the experiments in the paper Understanding the Effects of RLHF on LLM Generalisation and Diversity
☆44Updated last year
vzhong / silg
☆20Updated 3 years ago
abdulhaim / LMRL-Gym
☆99Updated last year
google-deepmind / emergent_in_context_learning
☆84Updated last year
cicl-stanford / procedural-evals-tom
☆33Updated 2 years ago
bigai-nlco / langsuite
Official Repo of LangSuitE
☆84Updated 11 months ago
McGill-NLP / VinePPO
Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"
☆167Updated 2 months ago
liziniu / policy_optimization
Code for Paper (Policy Optimization in RLHF: The Impact of Out-of-preference Data)
☆28Updated last year
jlin816 / rewards-from-language
Code and data for "Inferring Rewards from Language in Context" [ACL 2022].
☆15Updated 3 years ago
jhejna / cpl
Code for Contrastive Preference Learning (CPL)
☆174Updated 8 months ago
Sea-Snell / Implicit-Language-Q-Learning
Official code from the paper "Offline RL for Natural Language Generation with Implicit Language Q Learning"
☆208Updated 2 years ago
cognitiveailab / TextWorldExpress
Super fast implementations of common benchmark text world games
☆50Updated 4 months ago
YifeiZhou02 / ArCHer
Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"
☆185Updated 3 months ago
linlu-qiu / lm-inductive-reasoning
☆34Updated last year
allenai / ScienceWorld
ScienceWorld is a text-based virtual environment centered around accomplishing tasks from the standardized elementary science curriculum.
☆283Updated 3 weeks ago
ShuangLI59 / Pre-Trained-Language-Models-for-Interactive-Decision-Making
Pre-Trained Language Models for Interactive Decision-Making [NeurIPS 2022]
☆128Updated 3 years ago
alexrame / rewardedsoups
Rewarded soups official implementation
☆58Updated last year
flowersteam / Grounding_LLMs_with_online_RL
We perform functional grounding of LLMs' knowledge in BabyAI-Text
☆268Updated 11 months ago
Edward-Sun / easy-to-hard
Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision
☆123Updated 11 months ago
microsoft / RLHF-APA
RL algorithm: Advantage induced policy alignment
☆65Updated last year
microsoft / SmartPlay
SmartPlay is a benchmark for Large Language Models (LLMs). Uses a variety of games to test various important LLM capabilities as agents. …
☆140Updated last year
Ber666 / RAP
Reasoning with Language Model is Planning with World Model
☆168Updated last year
tianjunz / HIR
☆159Updated 2 years ago
gregorbachmann / Next-Token-Failures
☆89Updated last year
OpenDFM / Rememberer
[NeurIPS 2023] Large Language Models Are Semi-Parametric Reinforcement Learning Agents
☆34Updated last year
lupantech / PromptPG
Data and code for the ICLR 2023 paper "Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning".
☆156Updated last year
ZhaofengWu / counterfactual-evaluation
☆56Updated 2 months ago
iglu-contest / gridworld
A reinforcement learning environment for the IGLU 2022 at NeurIPS
☆34Updated 2 years ago
haotiansun14 / AdaPlanner
AdaPlanner: Language Models for Decision Making via Adaptive Planning from Feedback
☆112Updated 4 months ago
ZhaolinGao / REBEL
Reinforcement Learning via Regressing Relative Rewards
☆34Updated 7 months ago