raghavc / LLM-RLHF-Tuning-with-PPO-and-DPOView on GitHub
Comprehensive toolkit for Reinforcement Learning from Human Feedback (RLHF) training, featuring instruction fine-tuning, reward model training, and support for PPO and DPO algorithms with various configurations for the Alpaca, LLaMA, and LLaMA2 models.
191Feb 24, 2026Updated 4 months ago

Alternatives and similar repositories for LLM-RLHF-Tuning-with-PPO-and-DPO

Users that are interested in LLM-RLHF-Tuning-with-PPO-and-DPO are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Are these results useful?