raghavc / LLM-RLHF-Tuning-with-PPO-and-DPOLinks

Comprehensive toolkit for Reinforcement Learning from Human Feedback (RLHF) training, featuring instruction fine-tuning, reward model training, and support for PPO and DPO algorithms with various configurations for the Alpaca, LLaMA, and LLaMA2 models.

☆174

Alternatives and similar repositories for LLM-RLHF-Tuning-with-PPO-and-DPO

Users that are interested in LLM-RLHF-Tuning-with-PPO-and-DPO are comparing it to the libraries listed below

Sorting:

Oxen-AI / Self-Rewarding-Language-Models
This is work done by the Oxen.ai Community, trying to reproduce the Self-Rewarding Language Model paper from MetaAI.
☆130Updated 11 months ago
jwhj / OREO
☆116Updated 9 months ago
Linear95 / SPAG
Self-playing Adversarial Language Game Enhances LLM Reasoning, NeurIPS 2024
☆141Updated 8 months ago
microsoft / simulated-trial-and-error
☆122Updated last year
Mohammadjafari80 / GSM8K-RLVR
A simplified implementation for experimenting with RLVR on GSM8K, This repository provides a starting point for exploring reasoning.
☆136Updated 8 months ago
allenai / lumos
Code and data for "Lumos: Learning Agents with Unified Data, Modular Design, and Open-Source LLMs"
☆470Updated last year
OSU-NLP-Group / GrokkedTransformer
Code for NeurIPS'24 paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'
☆232Updated 3 months ago
shangshang-wang / Tina
Tina: Tiny Reasoning Models via LoRA
☆302Updated last month
SqueezeAILab / LLM2LLM
[ACL 2024] LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement
☆190Updated last year
withmartian / routerbench
The code for the paper ROUTERBENCH: A Benchmark for Multi-LLM Routing System
☆147Updated last year
ezelikman / STaR
Code for STaR: Bootstrapping Reasoning With Reasoning (NeurIPS 2022)
☆214Updated 2 years ago
tianyi-lab / Reflection_Tuning
[ACL'24] Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning
☆363Updated last year
open-thought / tiny-grpo
Minimal hackable GRPO implementation
☆297Updated 9 months ago
Yu-Fangxu / FoR
[ICML 2025] Flow of Reasoning: Training LLMs for Divergent Reasoning with Minimal Examples
☆108Updated 3 months ago
agentification / RAFA_code
☆144Updated last year
jonathanmli / Avalon-LLM
This repository contains a LLM benchmark for the social deduction game `Resistance Avalon'
☆129Updated 5 months ago
facebookresearch / memory
Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…
☆347Updated 10 months ago
rxlqn / awesome-llm-self-reflection
augmented LLM with self reflection
☆133Updated last year
waterhorse1 / LLM_Tree_Search
(ICML 2024) Alphazero-like Tree-Search can guide large language model decoding and training
☆283Updated last year
joey00072 / nanoGRPO
nanoGRPO is a lightweight implementation of Group Relative Policy Optimization (GRPO)
☆124Updated 5 months ago
facebookresearch / sweet_rl
Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks
☆248Updated 5 months ago
fangyuan-ksgk / Tiny-GRPO
minimal GRPO implementation from scratch
☆98Updated 7 months ago
felipemaiapolo / tinyBenchmarks
Evaluating LLMs with fewer examples
☆164Updated last year
sunblaze-ucb / Intuitor
Code for the paper: "Learning to Reason without External Rewards"
☆369Updated 3 months ago
casper-hansen / OpenCoconut
OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.
☆172Updated 9 months ago
kanishkg / stream-of-search
Repository for the paper Stream of Search: Learning to Search in Language
☆151Updated 9 months ago
xfactlab / orpo
Official repository for ORPO
☆463Updated last year
kyegomez / Lets-Verify-Step-by-Step
"Improving Mathematical Reasoning with Process Supervision" by OPENAI
☆111Updated 2 weeks ago
OpenBMB / Eurus
☆320Updated last year
WildEval / ZeroEval
A simple unified framework for evaluating LLMs
☆254Updated 6 months ago