raghavc / LLM-RLHF-Tuning-with-PPO-and-DPO
View external linksLinks

Comprehensive toolkit for Reinforcement Learning from Human Feedback (RLHF) training, featuring instruction fine-tuning, reward model training, and support for PPO and DPO algorithms with various configurations for the Alpaca, LLaMA, and LLaMA2 models.
183Mar 18, 2024Updated last year

Alternatives and similar repositories for LLM-RLHF-Tuning-with-PPO-and-DPO

Users that are interested in LLM-RLHF-Tuning-with-PPO-and-DPO are comparing it to the libraries listed below

Sorting:

Are these results useful?