microsoft / RLHF-APA

RL algorithm: Advantage induced policy alignment
62Updated last year

Related projects: