On-Policy Policy Gradient Algorithms in JAX
☆42Jan 25, 2024Updated 2 years ago
Alternatives and similar repositories for PolicyGradientsJax
Users that are interested in PolicyGradientsJax are comparing it to the libraries listed below
Sorting:
- Minimal Transformer base in JAX. A single backbone for language modelling, diffusion, classification, etc...☆14May 28, 2025Updated 9 months ago
- 3rd placed submission to the NeurIPS MineRL competition 2019☆10Mar 24, 2023Updated 2 years ago
- Various reinforcement learning algorithms written in Jax + Flax☆26Jun 24, 2023Updated 2 years ago
- Official repository for our paper on "Action Inference by Maximising Evidence: Zero-Shot Imitation from Observation with World Models"☆13Dec 4, 2023Updated 2 years ago
- ☆16Jul 16, 2024Updated last year
- Jax-Baseline is a Reinforcement Learning implementation using JAX and Flax/Haiku libraries, mirroring the functionality of Stable-Baselin…☆64Jan 2, 2026Updated 2 months ago
- Decision Transformer JAX - Reproduction of 'Decision Transformer: Reinforcement Learning via Sequence Modeling' in JAX and Haiku☆13Aug 14, 2024Updated last year
- [deprecated] Engine Agnostic Gym Environment for Robotics☆17Feb 10, 2022Updated 4 years ago
- High quality implementations of imitation and inverse reinforcement learning algorithms☆22Aug 19, 2025Updated 6 months ago
- ☆17Aug 2, 2022Updated 3 years ago
- ☆18Jul 25, 2024Updated last year
- Implementation of ICLR 2025 paper "Q-Adapter: Customizing Pre-trained LLMs to New Preferences with Forgetting Mitigation"☆18Oct 5, 2024Updated last year
- ☆19Apr 22, 2024Updated last year
- Implemenation of the HIERarchical imagionation On Structured State Space Sequence Models (HIEROS) paper☆21Jul 14, 2024Updated last year
- ☆15Apr 5, 2023Updated 2 years ago
- The visualization of a multi-agent reinforcement learning (MARL)-based strategy with efficient exploration strategy.☆20Oct 28, 2022Updated 3 years ago
- Decision Mamba: Reinforcement Learning via Sequence Modeling with Selective State Spaces☆49Apr 1, 2024Updated last year
- Implementing different learning algorithms and analyzing their performance in a Markov game model called the Soccer Game☆23Jan 29, 2023Updated 3 years ago
- Drop-in environment replacements that make your RL algorithm train faster.☆21Jun 19, 2024Updated last year
- ☆22May 14, 2021Updated 4 years ago
- Dreamer 4 jax implementation☆69Nov 28, 2025Updated 3 months ago
- This repository includes a benchmark and code for the paper "Evaluating LLMs at Detecting Errors in LLM Responses".☆31Aug 18, 2024Updated last year
- (NeurIPS '22) LISA: Learning Interpretable Skill Abstractions - A framework for unsupervised skill learning using Imitation☆29Feb 22, 2023Updated 3 years ago
- Clean single-file implementation of offline RL algorithms in JAX☆170Nov 24, 2025Updated 3 months ago
- Customizable RecSys Simulator for OpenAI Gym☆26Dec 7, 2021Updated 4 years ago
- A Julia package for consensus-based optimisation☆16Nov 28, 2025Updated 3 months ago
- A collection of RL algorithms written in JAX.☆105Jul 5, 2022Updated 3 years ago
- Benchmarking RL for POMDPs in Pure JAX [Code for "Structured State Space Models for In-Context Reinforcement Learning" (NeurIPS 2023)]☆112Dec 5, 2023Updated 2 years ago
- Code for Powderworld: A Platform for Understanding Generalization via Rich Task Distributions☆73Aug 31, 2024Updated last year
- Unity로 멀티 에이전트 강화학습(MARL) 수행하기 위한 프레임 워크 제공☆24Apr 17, 2022Updated 3 years ago
- Data-driven offline simulation for online reinforcement learning: benchmark and baselines☆31Jul 25, 2024Updated last year
- The Controllable Agent project trains RL Agents able to optimize any reward function specified in real time, without any further learning…☆70Jul 17, 2023Updated 2 years ago
- Optim4RL is a Jax framework of learning to optimize for reinforcement learning.☆28Nov 27, 2024Updated last year
- [TMLR 2025 & ICLR 2025 DeLTa] Official Implementation of Design Editing for Offline Model-based Optimization 🧬 🤖☆10Apr 17, 2025Updated 10 months ago
- JAX-accelerated Meta-Reinforcement Learning Environments Inspired by XLand and MiniGrid 🏎️☆325Dec 16, 2025Updated 2 months ago
- Learning from preferences is a common paradigm for fine-tuning language models. Yet, many algorithmic design decisions come into play. Ou…☆32Apr 20, 2024Updated last year
- Code and data for the paper "Understanding Hidden Context in Preference Learning: Consequences for RLHF"☆33Dec 14, 2023Updated 2 years ago
- off-policy RL on long sequences☆159Feb 17, 2026Updated 2 weeks ago
- ☆12Sep 21, 2023Updated 2 years ago