annieyan / Bandits-using-UCB-algorithm
Thompson Sampling for Bandits using UCB policy
☆10Updated 7 years ago
Related projects: ⓘ
- Contextual Bandit Algorithms (+Bandit Algorithms)☆22Updated 4 years ago
- Thompson Sampling Tutorial☆49Updated 5 years ago
- Code associated with the NeurIPS19 paper "Weighted Linear Bandits in Non-Stationary Environments"☆17Updated 4 years ago
- ☆26Updated 4 years ago
- Non stationary bandit for experiments with Reinforcement Learning☆34Updated 7 years ago
- paper list in the area of reinforcenment learning for recommendation systems☆24Updated 4 years ago
- Implementations of basic concepts dealt under the Reinforcement Learning umbrella. This project is collection of assignments in CS747: F…☆17Updated 6 years ago
- Contextual bandit in python☆111Updated 3 years ago
- Direct Gibbs sampling for DPMM using python.☆16Updated 7 years ago
- In this notebook several classes of multi-armed bandits are implemented. This includes epsilon greedy, UCB, Linear UCB (Contextual bandit…☆76Updated 3 years ago
- Dynamic Pricing BwK Problem and Reinforcement Learning☆28Updated 5 years ago
- Policy gradient reinforcement learning algorithm with importance sampling☆31Updated 6 years ago
- Simple implementation of GP-UCB algorithm.☆49Updated 7 years ago
- My solutions to Berkeley's CS294 (Deep Reinforcement Learning) Homework☆36Updated 6 years ago
- An official JAX-based code for our NeuraLCB paper, "Offline Neural Contextual Bandits: Pessimism, Optimization and Generalization", ICLR…☆13Updated 2 years ago
- Deconfounding Reinforcement Learning in Observational Settings☆48Updated 5 years ago
- Ordered Preference Elicitation Strategies for Multi-Objective Decision Making using Gaussian Processes☆24Updated 6 years ago
- Code to reproduce Supervised Policy Update (ICLR 2019)☆17Updated last year
- Semi-synthetic experiments to test several approaches for off-policy evaluation and optimization of slate recommenders.☆43Updated 6 years ago
- Code for "Best arm identification in multi-armed bandits with delayed feedback", AISTATS 2018.☆19Updated 6 years ago
- Code of ICML-2020 paper Dynamic Knapsack Optimization Towards Efficient Multi-Channel Sequential Advertising☆27Updated 4 years ago
- Experiments on a discrete mean field game model of population dynamics with reinforcement learning☆30Updated 11 months ago
- More about the exploration-exploitation tradeoff with harder bandits☆23Updated 5 years ago
- Bayesian Uncertainty Exploration in Deep Reinforcement Learning☆17Updated 7 years ago
- ☆14Updated 3 years ago
- Feature selection for maximizing expected cumulative reward☆29Updated 6 years ago
- ☆28Updated 4 years ago
- ☆28Updated last year
- Upper Confidence Tree Planner for ATARI games☆19Updated 8 years ago
- Study NeuralUCB and regret analysis for contextual bandit with neural decision☆90Updated 2 years ago