jiamingkong / rwkv_reward

Training a reward model for RLHF using RWKV.
14Updated last year

Related projects: