li-plus / nanoRLHF
View external linksLinks

Train a tiny LLaMA model from scratch to repeat your words using Reinforcement Learning from Human Feedback (RLHF)
16May 23, 2024Updated last year

Alternatives and similar repositories for nanoRLHF

Users that are interested in nanoRLHF are comparing it to the libraries listed below

Sorting:

Are these results useful?