tedmoskovitz / ConstrainedRL4LMsLinks

A library for constrained RLHF.

☆13

Alternatives and similar repositories for ConstrainedRL4LMs

Users that are interested in ConstrainedRL4LMs are comparing it to the libraries listed below

Sorting:

abdulhaim / LMRL-Gym
☆102Updated last year
YifeiZhou02 / ArCHer
Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"
☆193Updated 5 months ago
McGill-NLP / VinePPO
Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"
☆175Updated 4 months ago
alexrame / rewardedsoups
Rewarded soups official implementation
☆60Updated 2 years ago
tlc4418 / llm_optimization
A repo for RLHF training and BoN over LLMs, with support for reward model ensembles.
☆44Updated 8 months ago
flowersteam / lamorel
Lamorel is a Python library designed for RL practitioners eager to use Large Language Models (LLMs).
☆239Updated 3 weeks ago
dunnolab / awesome-in-context-rl
Awesome In-Context RL: A curated list of In-Context Reinforcement Learning - - —
☆235Updated last month
allenai / ScienceWorld
ScienceWorld is a text-based virtual environment centered around accomplishing tasks from the standardized elementary science curriculum.
☆295Updated 2 months ago
flowersteam / Grounding_LLMs_with_online_RL
We perform functional grounding of LLMs' knowledge in BabyAI-Text
☆273Updated last year
waterhorse1 / LLM_Tree_Search
(ICML 2024) Alphazero-like Tree-Search can guide large language model decoding and training
☆282Updated last year
microsoft / SmartPlay
SmartPlay is a benchmark for Large Language Models (LLMs). Uses a variety of games to test various important LLM capabilities as agents. …
☆141Updated last year
alecwangcq / f-divergence-dpo
Direct preference optimization with f-divergences.
☆14Updated 11 months ago
CraftJarvis / MC-Planner
Implementation of "Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agen…
☆286Updated 2 years ago
Ber666 / RAP
Reasoning with Language Model is Planning with World Model
☆175Updated 2 years ago
OpenDFM / Rememberer
[NeurIPS 2023] Large Language Models Are Semi-Parametric Reinforcement Learning Agents
☆37Updated last year
nicoladainese96 / code-world-models
Code release for "Generating Code World Models with Large Language Models Guided by Monte Carlo Tree Search" published at NeurIPS '24.
☆14Updated 7 months ago
Sea-Snell / Implicit-Language-Q-Learning
Official code from the paper "Offline RL for Natural Language Generation with Implicit Language Q Learning"
☆209Updated 2 years ago
vwxyzjn / lm-human-preference-details
RLHF implementation details of OAI's 2019 codebase
☆191Updated last year
liziniu / ReMax
Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement Learning Method for Aligning Large Language Models)
☆193Updated last year
vwxyzjn / summarize_from_feedback_details
☆151Updated 10 months ago
jhejna / cpl
Code for Contrastive Preference Learning (CPL)
☆176Updated 10 months ago
GFNOrg / gfn-lm-tuning
☆186Updated last year
Edward-Sun / easy-to-hard
Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision
☆123Updated last year
yuqingd / ellm
☆85Updated 2 years ago
floodsung / LLM-with-RL-papers
A collection of LLM with RL papers
☆278Updated last year
bigai-ai / civrealm
CivRealm is an interactive environment for the open-source strategy game Freeciv-web based on Freeciv, a Civilization-inspired game.
☆128Updated last year
WeiXiongUST / Building-Math-Agents-with-Multi-Turn-Iterative-Preference-Learning
This is an official implementation of the paper ``Building Math Agents with Multi-Turn Iterative Preference Learning'' with multi-turn DP…
☆30Updated 10 months ago
YangRui2015 / Generalizable-Reward-Model
Code for NeurIPS 2024 paper "Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs"
☆40Updated 7 months ago
YangRui2015 / RiC
Code for the ICML 2024 paper "Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment"
☆75Updated 4 months ago
facebookresearch / rlfh-gen-div
This is code for most of the experiments in the paper Understanding the Effects of RLHF on LLM Generalisation and Diversity
☆47Updated last year