yuniaXian / ppo_llm_DeepSpeed

Customized llm PPO (reinforcement learning) pipeline with deepSpeed. For Amex external usage. Training reward model, actor-critic models with referenced supervised fine-tuned model
1Updated 11 months ago

Alternatives and similar repositories for ppo_llm_DeepSpeed:

Users that are interested in ppo_llm_DeepSpeed are comparing it to the libraries listed below