Johannes-H / nfsp-leducLinks

Neural Fictitious Self-Play in Leduc Holdem

☆10

Alternatives and similar repositories for nfsp-leduc

Users that are interested in nfsp-leduc are comparing it to the libraries listed below

Sorting:

suyoung-lee / Episodic-Backward-Update
Implementation of "Sample-Efficient Deep Reinforcement Learning via Episodic Backward Update", NeurIPS 2019.
☆16Updated 5 years ago
mcmachado / count_based_exploration_sr
☆31Updated 6 years ago
kkhetarpal / ioc
Options of Interest: Temporal Abstraction with Interest Functions AAAI 2020
☆25Updated 5 years ago
nathangrinsztajn / Box-World
Implementation of the Box-World environment from the paper "Relational Deep Reinforcement Learning"
☆46Updated last year
tdavchev / option-critic
A Tensorflow implementation of the Option-Critic Architecture
☆71Updated 8 years ago
Hwhitetooth / lirpg
☆61Updated 7 years ago
davidbrandfonbrener / onestep-rl
☆42Updated 3 years ago
jeanharb / a2oc_delib
A3C style Option-Critic with deliberation cost
☆39Updated 7 years ago
clvoloshin / COBS
OPE Tools based on Empirical Study of Off Policy Policy Estimation paper.
☆61Updated 3 years ago
YuejiangLIU / prioritized_option_critic
Implementation of the Prioritized Option-Critic on the Four-Rooms Environment
☆16Updated 7 years ago
veronicachelu / temporal_abstraction
Option Critic with subgoal discovery by spectral decomposition of the Successor Features Matrix or clustering in Successor features space…
☆23Updated 6 years ago
ben-eysenbach / sac
Soft Actor-Critic
☆151Updated 7 years ago
tesslerc / GAC
Code accompanying NeurIPS 2019 paper: "Distributional Policy Optimization - An Alternative Approach for Continuous Control"
☆22Updated 5 years ago
thanard / me-trpo
☆92Updated last year
RomainLaroche / SPIBB
Safe Policy Improvement with Baseline Bootstrapping
☆26Updated 5 years ago
alshedivat / lola
Code release for Learning with Opponent-Learning Awareness and variations.
☆149Updated 2 years ago
victorcampos7 / edl
Code for "Explore, Discover and Learn: Unsupervised Discovery of State-Covering Skills"
☆36Updated 5 years ago
paulorauber / hpg
Hindsight policy gradients
☆45Updated 5 years ago
tesatory / hsp
Hierarchical Self-Play
☆21Updated 6 years ago
mrkulk / hierarchical-deep-RL
Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstractions and Intrinsic Motivation
☆87Updated 7 years ago
nnaisense / MAGE
Learning Action-Value Gradients in Model-based Policy Optimization
☆31Updated 3 years ago
facebookresearch / hanabi_SAD
Simplified Action Decoder for Deep Multi-Agent Reinforcement Learning
☆102Updated 3 years ago
facebookresearch / level-replay
This code implements Prioritized Level Replay, a method for sampling training levels for reinforcement learning agents that exploits the …
☆88Updated 4 years ago
Farama-Foundation / D4RL-Evaluations
☆200Updated 2 years ago
lanyavik / BAIL
☆17Updated 3 years ago
yifan12wu / rl-laplacian
Learning Laplacian Representations in Reinforcement Learning
☆17Updated 4 years ago
AnujMahajanOxf / MAVEN
Submission for MAVEN: Multi-Agent Variational Exploration
☆58Updated 3 years ago
RLAgent / state-marginal-matching
Efficient Exploration via State Marginal Matching (2019)
☆69Updated 6 years ago
BorealisAI / pommerman-baseline
Code for the paper "Skynet: A Top Deep RL Agent in the Inaugural Pommerman Team Competition"
☆37Updated 6 years ago
mklissa / PPOC
Proximal Policy Option-Critic
☆25Updated 6 years ago