l294265421 / alpaca-rlhf

Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback) based on DeepSpeed Chat
112Updated last year

Alternatives and similar repositories for alpaca-rlhf:

Users that are interested in alpaca-rlhf are comparing it to the libraries listed below