acyclics / MPO

Pytorch implementation of "Maximum a Posteriori Policy Optimization" with Retrace for Discrete gym environments
26Updated 4 years ago

Related projects

Alternatives and complementary repositories for MPO