CSKrishna / Optimal-bidding-policy-using-Policy-Gradient-in-a-Multi-agent-Contextual-Bandit-setting

We use policy gradient to help agents learn optimal policies in a competitive multi-agent contextual bandit setting
11Updated 6 years ago

Related projects: