raghavc / LLM-RLHF-Tuning-with-PPO-and-DPOView on GitHub
Comprehensive toolkit for Reinforcement Learning from Human Feedback (RLHF) training, featuring instruction fine-tuning, reward model training, and support for PPO and DPO algorithms with various configurations for the Alpaca, LLaMA, and LLaMA2 models.
188Feb 24, 2026Updated last month

Alternatives and similar repositories for LLM-RLHF-Tuning-with-PPO-and-DPO

Users that are interested in LLM-RLHF-Tuning-with-PPO-and-DPO are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Are these results useful?