MinRegret / deluca
Performant, differentiable reinforcement learning
☆25Updated last year
Related projects ⓘ
Alternatives and complementary repositories for deluca
- ☆30Updated last year
- JAX code for the paper "Control-Oriented Model-Based Reinforcement Learning with Implicit Differentiation"☆43Updated 3 years ago
- Google AI Princeton control framework☆38Updated 4 years ago
- On the model-based stochastic value gradient for continuous reinforcement learning☆55Updated last year
- ☆28Updated 3 years ago
- Code for reproducing experiments in Model-Based Active Exploration, ICML 2019☆78Updated 5 years ago
- ☆33Updated 4 years ago
- [CoRL 2020] COG: Connecting New Skills to Past Experience with Offline Reinforcement Learning☆31Updated 4 years ago
- Code for the paper "Gamma-Models: Generative Temporal Difference Learning for Infinite-Horizon Prediction"☆42Updated last year
- Repository for the paper "Long-Horizon Visual Planning with Goal-Conditioned Hierarchical Predictors"☆44Updated 2 years ago
- ☆34Updated last year
- Model-based reinforcement learning in TensorFlow☆54Updated 3 years ago
- Model-based reinforcement learning (generative simulator models and planning agents)☆15Updated 3 years ago
- ☆26Updated 5 years ago
- Model-Based Reinforcement Learning via Latent-Space Collocation.☆32Updated last year
- Co-Adaptation of Algorithmic and Implementational Innovations in Inference-based Deep Reinforcement Learning (NeurIPS2021)☆20Updated 3 years ago
- The code accompaniment for the CoRL 2020 paper: A User's Guide to Calibrating Robotics Simulators (https://arxiv.org/abs/2011.08985), fro…☆29Updated 4 years ago
- ☆27Updated 3 years ago
- ☆17Updated 3 years ago
- improved Cross Entropy Method for trajectory optimization☆69Updated 3 years ago
- Companion code to CoRL 2018 paper: E Bıyık, D Sadigh. "Batch Active Preference-Based Learning of Reward Functions". Conference on Robot L…☆28Updated 5 years ago
- [ICLR 22] Value Gradient weighted Model-Based Reinforcement Learning.☆24Updated last year
- Infer how suboptimal agents are suboptimal while planning, for example if they are hyperbolic time discounters.☆23Updated 4 years ago
- ☆23Updated 2 years ago
- Source code for the paper "Policy Architectures for Compositional Generalization in Control"☆29Updated 2 years ago
- Implementation of PILCO for the Model-Based Baselines Project☆17Updated 5 years ago
- Generalised UDRL☆37Updated 2 years ago
- ☆53Updated 6 years ago
- My Body Is A Cage☆38Updated 3 years ago