schinger / AlphaZeroLinks
Simplest AlphaZero Implementation
☆24Updated 10 months ago
Alternatives and similar repositories for AlphaZero
Users that are interested in AlphaZero are comparing it to the libraries listed below
Sorting:
- [NeurIPS 2023] We use large language models as commonsense world model and heuristic policy within Monte-Carlo Tree Search, enabling bett…☆283Updated 10 months ago
- Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement Learning Method for Aligning Large Language Models)☆193Updated last year
- Implementation of the ICML 2024 paper "Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning" pr…☆110Updated last year
- This is a repo for showcasing using MCTS with LLMs to solve gsm8k problems☆91Updated 6 months ago
- Code release for "Generating Code World Models with Large Language Models Guided by Monte Carlo Tree Search" published at NeurIPS '24.☆11Updated 7 months ago
- Implementation of ICLR 2025 paper "Q-Adapter: Customizing Pre-trained LLMs to New Preferences with Forgetting Mitigation"☆18Updated 11 months ago
- ☆32Updated 10 months ago
- Natural Language Reinforcement Learning☆97Updated last month
- A comprehensive list of PAPERS, CODEBASES, and, DATASETS on Decision Making using Foundation Models including LLMs and VLMs.☆377Updated last year
- Code for NeurIPS 2024 paper "Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs"☆40Updated 7 months ago
- ☆62Updated this week
- Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"☆190Updated 5 months ago
- A collection of LLM with RL papers☆277Updated last year
- Awesome In-Context RL: A curated list of In-Context Reinforcement Learning - - —☆230Updated 2 weeks ago
- Reference implementation for Token-level Direct Preference Optimization(TDPO)☆148Updated 7 months ago
- Reinforced Multi-LLM Agents training☆45Updated 3 months ago
- Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".☆105Updated last month
- ☆289Updated 4 months ago
- (ICML 2024) Alphazero-like Tree-Search can guide large language model decoding and training☆280Updated last year
- TextStarCraft2,a pure language env which support llms play starcraft2☆288Updated 5 months ago
- [NeurIPS 2023] Large Language Models Are Semi-Parametric Reinforcement Learning Agents☆37Updated last year
- ☆112Updated 5 months ago
- ☆17Updated last year
- OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning☆149Updated 9 months ago
- AAAI24(Oral) ProAgent: Building Proactive Cooperative Agents with Large Language Models☆90Updated 6 months ago
- [NeurIPS 2023 FMDM Workshop] Skill Reinforcement Learning and Planning for Open-World Long-Horizon Tasks☆192Updated last year
- Curation of resources for LLM research, screened by @tongyx361 to ensure high quality and accompanied with elaborately-written concise de…☆61Updated last year
- Official code for the paper, "Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning"☆136Updated 2 months ago
- This repo is a live list of papers on game playing and large multimodality model - "A Survey on Game Playing Agents and Large Models: Met…☆154Updated last year
- ICLR 2021: "Monte-Carlo Planning and Learning with Language Action Value Estimates"☆33Updated last year