vwxyzjn/ppo-implementation-details

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/vwxyzjn/ppo-implementation-details)

vwxyzjn / ppo-implementation-details

The source code for the blog post The 37 Implementation Details of Proximal Policy Optimization

☆921

Alternatives and similar repositories for ppo-implementation-details

Users that are interested in ppo-implementation-details are comparing it to the libraries listed below

Sorting:

vwxyzjn / cleanrl
View on GitHub
High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, T…
☆9,213Jul 8, 2025Updated 8 months ago
Lizhi-sjtu / DRL-code-pytorch
View on GitHub
Concise pytorch implements of DRL algorithms, including REINFORCE, A2C, DQN, PPO(discrete and continuous), DDPG, TD3, SAC.
☆1,445Mar 29, 2023Updated 2 years ago
nikhilbarhate99 / PPO-PyTorch
View on GitHub
Minimal implementation of clipped objective Proximal Policy Optimization (PPO) in PyTorch
☆2,314Jul 9, 2024Updated last year
DLR-RM / stable-baselines3
View on GitHub
PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
☆12,829Feb 21, 2026Updated 2 weeks ago
AI4Finance-Foundation / ElegantRL
View on GitHub
Massively Parallel Deep Reinforcement Learning. 🔥
☆4,297Feb 20, 2026Updated 2 weeks ago
sail-sg / envpool
View on GitHub
C++-based high-performance parallel environment execution engine (vectorized env) for general RL environments.
☆1,274Aug 12, 2024Updated last year
marlbenchmark / on-policy
View on GitHub
This is the official implementation of Multi-Agent PPO (MAPPO).
☆1,902Jul 18, 2024Updated last year
MarcoMeter / recurrent-ppo-truncated-bptt
View on GitHub
Baseline implementation of recurrent PPO using truncated BPTT
☆160Apr 28, 2024Updated last year
DLR-RM / rl-baselines3-zoo
View on GitHub
A training framework for Stable Baselines3 reinforcement learning agents, with hyperparameter optimization and pre-trained agents include…
☆2,727Feb 26, 2026Updated last week
thu-ml / tianshou
View on GitHub
An elegant PyTorch deep reinforcement learning library.
☆10,305Dec 1, 2025Updated 3 months ago
vwxyzjn / invalid-action-masking
View on GitHub
Source Code for A Closer Look at Invalid Action Masking in Policy Gradient Algorithms
☆167May 9, 2023Updated 2 years ago
pytorch / rl
View on GitHub
A modular, primitive-first, python-first PyTorch library for Reinforcement Learning.
☆3,327Updated this week
opendilab / PPOxFamily
View on GitHub
PPO x Family DRL Tutorial Course（决策智能入门级公开课：8节课帮你盘清算法理论，理顺代码逻辑，玩转决策AI应用实践）
☆2,522Mar 13, 2025Updated 11 months ago
alex-petrenko / sample-factory
View on GitHub
High throughput synchronous and asynchronous reinforcement learning
☆973Jan 29, 2026Updated last month
Farama-Foundation / D4RL
View on GitHub
A collection of reference environments for offline reinforcement learning
☆1,656Nov 18, 2024Updated last year
ericyangyu / PPO-for-Beginners
View on GitHub
A simple and well styled PPO implementation. Based on my Medium series: https://medium.com/@eyyu/coding-ppo-from-scratch-with-pytorch-par…
☆1,214Oct 1, 2024Updated last year
openai / spinningup
View on GitHub
An educational resource to help anyone learn deep reinforcement learning.
☆11,627Aug 5, 2024Updated last year
tinkoff-ai / CORL
View on GitHub
High-quality single-file implementations of SOTA Offline and Offline-to-Online RL algorithms: AWAC, BC, CQL, DT, EDAC, IQL, SAC-N, TD3+BC…
☆1,329Aug 3, 2023Updated 2 years ago
XinJingHao / PPO-Continuous-Pytorch
View on GitHub
A clean and robust Pytorch implementation of PPO on continuous action space.
☆171Jun 8, 2024Updated last year
MarcoMeter / episodic-transformer-memory-ppo
View on GitHub
Clean baseline implementation of PPO using an episodic TransformerXL memory
☆205Jun 18, 2024Updated last year
luchris429 / purejaxrl
View on GitHub
Really Fast End-to-End Jax RL Implementations
☆1,028Sep 9, 2024Updated last year
Stable-Baselines-Team / stable-baselines3-contrib
View on GitHub
Contrib package for Stable-Baselines3 - Experimental reinforcement learning (RL) code
☆693Feb 6, 2026Updated last month
opendilab / DI-engine
View on GitHub
OpenDILab Decision AI Engine. The Most Comprehensive Reinforcement Learning Framework B.P.
☆3,598Dec 7, 2025Updated 3 months ago
ikostrikov / pytorch-a2c-ppo-acktr-gail
View on GitHub
PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinfor…
☆3,876May 29, 2022Updated 3 years ago
kzl / decision-transformer
View on GitHub
Official codebase for Decision Transformer: Reinforcement Learning via Sequence Modeling.
☆2,773Apr 29, 2024Updated last year
MarcoMeter / endless-memory-gym
View on GitHub
Challenging Memory-based Deep Reinforcement Learning Agents
☆111Oct 27, 2024Updated last year
RobertTLange / gymnax
View on GitHub
RL Environments in JAX 🌍
☆868May 30, 2025Updated 9 months ago
devindeng94 / smac-hard
View on GitHub
Enabling Mixed Opponent Strategy Script and Self-play on SMAC
☆41Jul 24, 2025Updated 7 months ago
jidiai / Competition_Football
View on GitHub
☆12Jun 17, 2022Updated 3 years ago
vwxyzjn / lm-human-preference-details
View on GitHub
RLHF implementation details of OAI's 2019 codebase
☆197Jan 14, 2024Updated 2 years ago
google-research / rliable
View on GitHub
[NeurIPS'21 Outstanding Paper] Library for reliable evaluation on RL and ML benchmarks, even with only a handful of seeds.
☆867Aug 12, 2024Updated last year
allenai / RL4LMs
View on GitHub
A modular RL library to fine-tune language models to human preferences
☆2,380Mar 1, 2024Updated 2 years ago
clvrai / awesome-rl-envs
View on GitHub
☆1,316May 27, 2024Updated last year
ikostrikov / jaxrl
View on GitHub
JAX (Flax) implementation of algorithms for Deep Reinforcement Learning with continuous action spaces.
☆753Oct 26, 2022Updated 3 years ago
openai / baselines
View on GitHub
OpenAI Baselines: high-quality implementations of reinforcement learning algorithms
☆16,656Aug 1, 2024Updated last year
danijar / dreamerv3
View on GitHub
Mastering Diverse Domains through World Models
☆2,885Sep 23, 2025Updated 5 months ago
opendilab / awesome-RLHF
View on GitHub
A curated list of reinforcement learning with human feedback resources (continually updated)
☆4,317Dec 9, 2025Updated 3 months ago
rail-berkeley / softlearning
View on GitHub
Softlearning is a reinforcement learning framework for training maximum entropy policies in continuous domains. Includes the official imp…
☆1,409Nov 29, 2023Updated 2 years ago
quantumiracle / Popular-RL-Algorithms
View on GitHub
PyTorch implementation of Soft Actor-Critic (SAC), Twin Delayed DDPG (TD3), Actor-Critic (AC/A2C), Proximal Policy Optimization (PPO), QT…
☆1,332Mar 13, 2025Updated 11 months ago