jjkke88 / trpo

trust region policy optimization base on gym and tensorflow, can run in distribution mode
15Updated 7 years ago

Related projects: