schinger / AlphaZeroLinks
Simplest AlphaZero Implementation
☆22Updated 9 months ago
Alternatives and similar repositories for AlphaZero
Users that are interested in AlphaZero are comparing it to the libraries listed below
Sorting:
- Implementation of the ICML 2024 paper "Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning" pr…☆107Updated last year
- This is a repo for showcasing using MCTS with LLMs to solve gsm8k problems☆86Updated 4 months ago
- [NeurIPS 2023] We use large language models as commonsense world model and heuristic policy within Monte-Carlo Tree Search, enabling bett…☆280Updated 8 months ago
- Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement Learning Method for Aligning Large Language Models)☆190Updated last year
- MARFT stands for Multi-Agent Reinforcement Fine-Tuning. This repository implements an LLM-based multi-agent reinforcement fine-tuning fra…☆54Updated 3 weeks ago
- Natural Language Reinforcement Learning☆92Updated last week
- Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"☆185Updated 3 months ago
- Code for ACL2024 paper - Adversarial Preference Optimization (APO).☆56Updated last year
- A comprehensive list of PAPERS, CODEBASES, and, DATASETS on Decision Making using Foundation Models including LLMs and VLMs.☆374Updated last year
- Reinforced Multi-LLM Agents training☆35Updated 2 months ago
- Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)☆147Updated 9 months ago
- The Code Repo for Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization☆117Updated 11 months ago
- ☆32Updated 9 months ago
- TextStarCraft2,a pure language env which support llms play starcraft2☆286Updated 3 months ago
- AAAI24(Oral) ProAgent: Building Proactive Cooperative Agents with Large Language Models☆89Updated 5 months ago
- [NeurIPS 2023] Large Language Models Are Semi-Parametric Reinforcement Learning Agents☆34Updated last year
- AdaPlanner: Language Models for Decision Making via Adaptive Planning from Feedback☆112Updated 4 months ago
- Reasoning with Language Model is Planning with World Model☆168Updated last year
- ☆103Updated 8 months ago
- MPO: Boosting LLM Agents with Meta Plan Optimization☆64Updated 5 months ago
- Official code for the paper, "Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning"☆133Updated 3 weeks ago
- ☆14Updated last year
- A research repo for experiments about Reinforcement Finetuning☆50Updated 4 months ago
- Reference implementation for Token-level Direct Preference Optimization(TDPO)☆146Updated 5 months ago
- ☆129Updated last year
- OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning☆147Updated 7 months ago
- This my attempt to create Self-Correcting-LLM based on the paper Training Language Models to Self-Correct via Reinforcement Learning by g…☆35Updated last month
- A collection of LLM with RL papers☆276Updated last year
- ☆65Updated 8 months ago
- Implementation for the research paper "Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision".☆56Updated 8 months ago