mcguire-steve / hybrid-linucbView external linksLinks
Hybrid Linear UCB Multi-arm Bandit library
☆14Oct 5, 2016Updated 9 years ago
Alternatives and similar repositories for hybrid-linucb
Users that are interested in hybrid-linucb are comparing it to the libraries listed below
Sorting:
- Complete Reinforcement Learning Toolkit for Large Language Models!☆21Aug 2, 2025Updated 6 months ago
- Hybrid Linear UCB bandit learning algorithm L Li(2010) python code☆56Dec 23, 2015Updated 10 years ago
- Open-source Human Feedback Library☆11Oct 25, 2023Updated 2 years ago
- RWKV is a RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best …☆10Nov 3, 2023Updated 2 years ago
- Reproduction of the paper "Soft Q-Learning with Mutual Information Regularization" CoRL 2019.☆10Jan 10, 2019Updated 7 years ago
- ☆11Mar 23, 2025Updated 10 months ago
- This is a fork of optimization part of RISO project (http://riso.sourceforge.net/)☆13Aug 30, 2015Updated 10 years ago
- A Toolkit for Fine-Tuning Large Language Models with LoRA and DeepSpeed☆11Apr 14, 2023Updated 2 years ago
- ☆11Jan 12, 2023Updated 3 years ago
- 🚀 Sliding Window Attention Training for Efficient Large Language Models☆15Dec 8, 2025Updated 2 months ago
- An interactive story app for Android . . .☆15Dec 14, 2014Updated 11 years ago
- Learning bisimulation metrics for control, particularly suited to sparse reward settings☆10Feb 28, 2023Updated 2 years ago
- 练习题,python 协同过滤ALS模型实现:商品推荐 + 用户人群放大☆10Jun 4, 2020Updated 5 years ago
- GraphQL and Rest API rewrite of the current Open Targets platform API☆15Updated this week
- ☆41Mar 14, 2024Updated last year
- Direct Preference Optimization for RWKV, aiming for RWKV-5 and 6.☆11Mar 1, 2024Updated last year
- Reproduced the DFT method without using Verl. https://arxiv.org/abs/2508.05629☆21Oct 14, 2025Updated 4 months ago
- A ble keyboard firmware using nrf52810/52832☆12Dec 19, 2021Updated 4 years ago
- ☆13Jan 22, 2025Updated last year
- ☆11Jul 23, 2023Updated 2 years ago
- Official code repo for NeurIPS 2025 Spotlight paper, "Debate or Vote: Which Yields Better Decisions in Multi-Agent LLMs?"☆48Oct 15, 2025Updated 4 months ago
- dataX redis writer plugin☆12Jul 13, 2017Updated 8 years ago
- Causal Simulations for Uplift Modeling☆12Jan 22, 2020Updated 6 years ago
- ☆12Apr 13, 2024Updated last year
- Stream Data based News Recommendation - Contextual Bandit Approach☆47Nov 15, 2017Updated 8 years ago
- Count based exploration with the successor representation for Unity ML's Pyramid☆12Jun 19, 2019Updated 6 years ago
- Java version of liblbfgs: http://www.chokkan.org/software/liblbfgs/☆15Dec 16, 2020Updated 5 years ago
- REST full SimServer☆22Oct 27, 2022Updated 3 years ago
- ☆16Mar 13, 2023Updated 2 years ago
- This is a multilabel classification layer for mxnet.☆12Apr 1, 2016Updated 9 years ago
- 仿Android搜狐新闻UI☆15Aug 28, 2013Updated 12 years ago
- a set of scripts to easily convert all training data from huggingface into alpaca instruct or sharegpt format, which should allow for eas…☆18Mar 14, 2025Updated 11 months ago
- A regularized version of RBM for unsupervised feature selection.☆13Nov 20, 2019Updated 6 years ago
- ☆16May 31, 2024Updated last year
- 一个简单的,由ChatGPT主导编写的api,使用简单的请求访问ChatRWKV☆15May 19, 2023Updated 2 years ago
- This is the oficial repository for "Safer-Instruct: Aligning Language Models with Automated Preference Data"☆17Feb 22, 2024Updated last year
- ChatGPT-like Web UI for RWKVstic☆19Apr 23, 2023Updated 2 years ago
- ☆16Feb 2, 2023Updated 3 years ago
- Code for paper "Open-Domain Hierarchical Event Schema Induction by Incremental Prompting and Verification"☆16Jul 4, 2023Updated 2 years ago