moaradwan / deep-learning-contextual-banditsLinks
Deep learning models for contextual multi-armed bandit setting
☆13Updated 4 years ago
Alternatives and similar repositories for deep-learning-contextual-bandits
Users that are interested in deep-learning-contextual-bandits are comparing it to the libraries listed below
Sorting:
- Exploration into the Scaling Value Iteration Networks paper, from Schmidhuber's group☆37Updated last year
- Vintix: Action Model via In-Context Reinforcement Learning - - — ICML 2025☆44Updated 6 months ago
- An implementation of PPO in Pytorch☆101Updated 2 weeks ago
- XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning - - — ICLR 2025☆81Updated 10 months ago
- Synchronized Curriculum Learning for RL Agents☆116Updated last month
- ☆67Updated 2 months ago
- Implementation of Soft Actor Critic and some of its improvements in Pytorch☆60Updated 10 months ago
- Author's implementation of ReBRAC, a minimalist improvement upon TD3+BC☆61Updated 2 years ago
- ☆15Updated last year
- Efficient World Models with Context-Aware Tokenization. ICML 2024☆114Updated last year
- Learning from preferences is a common paradigm for fine-tuning language models. Yet, many algorithmic design decisions come into play. Ou…☆32Updated last year
- Official implementation of "Latent Action Learning Requires Supervision in the Presence of Distractors", ICML 2025☆25Updated 5 months ago
- JAX implementation of VQVAE/VQGAN autoencoders (+FSQ)☆40Updated last year
- JAX implementation of the Mistral 7b v0.1 model☆13Updated last year
- Efficiently discovering algorithms via LLMs with evolutionary search and reinforcement learning.☆118Updated 3 weeks ago
- BASALT Benchmark datasets, evaluation code and agent training example.☆21Updated 2 years ago
- Implementation of the new SOTA for model based RL, from the paper "Improving Transformer World Models for Data-Efficient RL", in Pytorch☆145Updated 7 months ago
- Advantage Leftover Lunch Reinforcement Learning (A-LoL RL): Improving Language Models with Advantage-based Offline Policy Gradients☆26Updated last year
- A Large Recurrent Action Model: xLSTM enables Fast Inference for Robotics Tasks☆36Updated last year
- Code for Contrastive Preference Learning (CPL)☆177Updated last year
- A simple, performant and scalable JAX-based world modeling codebase☆113Updated last month
- Implementation of Diffusion Transformers and Rectified Flow in Jax☆27Updated last year
- Reinforcement Learning via Regressing Relative Rewards☆38Updated last year
- ☆35Updated last year
- Pytorch Implementation of MuZero Unplugged for gym environment. This algorithm is capable of supporting a wide range of action and observ…☆34Updated 5 months ago
- Minimal Decision Transformer Implementation written in Jax (Flax).☆17Updated 3 years ago
- Gym environment for playing Wordle with RL agents☆43Updated 3 years ago
- [AutoML'22] Bayesian Generational Population-based Training (BG-PBT)☆29Updated 3 years ago
- CleanRL's implementation of DeepMind's Podracer Sebulba Architecture for Distributed DRL☆118Updated last year
- A2C is a special case of PPO!☆22Updated 3 years ago