mengdi-li / awesome-RLAIFLinks

A continually updated list of literature on Reinforcement Learning from AI Feedback (RLAIF)

☆192

Alternatives and similar repositories for awesome-RLAIF

Users that are interested in awesome-RLAIF are comparing it to the libraries listed below

Sorting:

haotiansun14 / AdaPlanner
AdaPlanner: Language Models for Decision Making via Adaptive Planning from Feedback
☆122Updated 8 months ago
PKU-Alignment / aligner
[NeurIPS 2024 Oral] Aligner: Efficient Alignment by Learning to Correct
☆190Updated 10 months ago
Yifan-Song793 / ETO
Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)
☆159Updated last year
Ber666 / RAP
Reasoning with Language Model is Planning with World Model
☆180Updated 2 years ago
YifeiZhou02 / ArCHer
Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"
☆198Updated 7 months ago
liziniu / ReMax
Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement Learning Method for Aligning Large Language Models)
☆198Updated last year
PKU-Alignment / AlignmentSurvey
AI Alignment: A Comprehensive Survey
☆137Updated 2 years ago
louieworth / awesome-rlhf
An index of algorithms for reinforcement learning from human feedback (rlhf))
☆92Updated last year
yihedeng9 / rlhf-summary-notes
A brief and partial summary of RLHF algorithms.
☆139Updated 9 months ago
WooooDyy / LLM-Reverse-Curriculum-RL
Implementation of the ICML 2024 paper "Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning" pr…
☆113Updated last year
jwhj / OREO
☆117Updated 10 months ago
karthikv792 / LLMs-Planning
An extensible benchmark for evaluating large language models on planning
☆432Updated 2 months ago
Berkeley-NLP / Agent-Eval-Refine
Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]
☆147Updated last year
LeapLabTHU / ExpeL
☆185Updated 11 months ago
Vance0124 / Token-level-Direct-Preference-Optimization
Reference implementation for Token-level Direct Preference Optimization(TDPO)
☆148Updated 9 months ago
hkust-nlp / AgentBoard
An Analytical Evaluation Board of Multi-turn LLM Agents [NeurIPS 2024 Oral]
☆368Updated last year
zjunlp / WKM
[NeurIPS 2024] Agent Planning with World Knowledge Model
☆157Updated 11 months ago
dongxiangjue / Awesome-LLM-Self-Improvement
A curated list of awesome LLM Inference-Time Self-Improvement (ITSI, pronounced "itsy") papers from our recent survey: A Survey on Large …
☆97Updated 11 months ago
waterhorse1 / Natural-language-RL
Natural Language Reinforcement Learning
☆100Updated 4 months ago
kyegomez / Lets-Verify-Step-by-Step
"Improving Mathematical Reasoning with Process Supervision" by OPENAI
☆113Updated last month
Timothyxxx / WorldModelPapers
Paper collections of the continuous effort start from World Models.
☆189Updated last year
kanishkg / cognitive-behaviors
☆216Updated 8 months ago
openai / safety-rbr-code-and-data
Code and example data for the paper: Rule Based Rewards for Language Model Safety
☆202Updated last year
architsharma97 / dpo-rlaif
☆100Updated last year
AGI-Edgerunners / LLM-Planning-Papers
Must-read Papers on Large Language Model (LLM) Planning.
☆433Updated last year
YuxiXie / MCTS-DPO
This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.
☆327Updated last year
abdulhaim / LMRL-Gym
☆106Updated last year
1989Ryan / llm-mcts
[NeurIPS 2023] We use large language models as commonsense world model and heuristic policy within Monte-Carlo Tree Search, enabling bett…
☆290Updated last year
rxlqn / awesome-llm-self-reflection
augmented LLM with self reflection
☆135Updated 2 years ago
allenai / FineGrainedRLHF
☆281Updated 11 months ago