schinger / AlphaZeroLinks
Simplest AlphaZero Implementation
☆26Updated last year
Alternatives and similar repositories for AlphaZero
Users that are interested in AlphaZero are comparing it to the libraries listed below
Sorting:
- This is a repo for showcasing using MCTS with LLMs to solve gsm8k problems☆93Updated last month
- Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement Learning Method for Aligning Large Language Models)☆199Updated 2 years ago
- [NeurIPS 2023] We use large language models as commonsense world model and heuristic policy within Monte-Carlo Tree Search, enabling bett…☆291Updated last year
- Implementation of the ICML 2024 paper "Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning" pr…☆114Updated last year
- ☆118Updated 8 months ago
- Official code for the paper, "Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning"☆145Updated 2 months ago
- RLHF experiments on a single A100 40G GPU. Support PPO, GRPO, REINFORCE, RAFT, RLOO, ReMax, DeepSeek R1-Zero reproducing.☆76Updated 10 months ago
- Reinforced Multi-LLM Agents training☆60Updated 6 months ago
- ☆74Updated last month
- Implementation of ICLR 2025 paper "Q-Adapter: Customizing Pre-trained LLMs to New Preferences with Forgetting Mitigation"☆18Updated last year
- Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"☆198Updated 8 months ago
- A visuailzation tool to make deep understaning and easier debugging for RLHF training.☆274Updated 10 months ago
- Natural Language Reinforcement Learning☆100Updated 4 months ago
- ☆68Updated last month
- A collection of LLM with RL papers☆278Updated last year
- ☆32Updated last year
- The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning.☆403Updated 5 months ago
- ☆160Updated last year
- Reference implementation for Token-level Direct Preference Optimization(TDPO)☆149Updated 10 months ago
- A comprehensive list of PAPERS, CODEBASES, and, DATASETS on Decision Making using Foundation Models including LLMs and VLMs.☆383Updated last year
- Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".☆115Updated 4 months ago
- The Code Repo for Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization☆128Updated last year
- Curation of resources for LLM research, screened by @tongyx361 to ensure high quality and accompanied with elaborately-written concise de…☆63Updated last year
- ☆68Updated last year
- Full stack LLM (Pre-training/finetuning, PPO(RLHF), Inference, Quant, etc.)☆30Updated 10 months ago
- ☆321Updated 6 months ago
- TextStarCraft2,a pure language env which support llms play starcraft2☆293Updated 7 months ago
- This repo is a live list of papers on game playing and large multimodality model - "A Survey on Game Playing Agents and Large Models: Met…☆160Updated last year
- ☆87Updated 4 months ago
- Code for ACL2024 paper - Adversarial Preference Optimization (APO).☆56Updated last year