schinger / AlphaZeroLinks
Simplest AlphaZero Implementation
☆25Updated 11 months ago
Alternatives and similar repositories for AlphaZero
Users that are interested in AlphaZero are comparing it to the libraries listed below
Sorting:
- This is a repo for showcasing using MCTS with LLMs to solve gsm8k problems☆91Updated 7 months ago
- Implementation of ICLR 2025 paper "Q-Adapter: Customizing Pre-trained LLMs to New Preferences with Forgetting Mitigation"☆18Updated last year
- Implementation of the ICML 2024 paper "Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning" pr…☆112Updated last year
- ☆33Updated last year
- [NeurIPS 2023] We use large language models as commonsense world model and heuristic policy within Monte-Carlo Tree Search, enabling bett…☆287Updated 11 months ago
- ☆17Updated last year
- Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement Learning Method for Aligning Large Language Models)☆194Updated last year
- TextStarCraft2,a pure language env which support llms play starcraft2☆289Updated 6 months ago
- Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"☆196Updated 6 months ago
- A comprehensive list of PAPERS, CODEBASES, and, DATASETS on Decision Making using Foundation Models including LLMs and VLMs.☆379Updated last year
- AdaPlanner: Language Models for Decision Making via Adaptive Planning from Feedback☆121Updated 7 months ago
- Curation of resources for LLM research, screened by @tongyx361 to ensure high quality and accompanied with elaborately-written concise de…☆61Updated last year
- Natural Language Reinforcement Learning☆99Updated 3 months ago
- Reasoning with Language Model is Planning with World Model☆175Updated 2 years ago
- Reference implementation for Token-level Direct Preference Optimization(TDPO)☆148Updated 8 months ago
- Reinforced Multi-LLM Agents training☆56Updated 4 months ago
- The Code Repo for Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization☆123Updated last year
- [ACL'24] Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization☆91Updated last year
- AAAI24(Oral) ProAgent: Building Proactive Cooperative Agents with Large Language Models☆92Updated 7 months ago
- The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning.☆360Updated 3 months ago
- Official implementation of "Direct Preference-based Policy Optimization without Reward Modeling" (NeurIPS 2023)☆42Updated last year
- A collection of LLM with RL papers☆278Updated last year
- [NeurIPS 2023] Large Language Models Are Semi-Parametric Reinforcement Learning Agents☆38Updated last year
- ☆114Updated 6 months ago
- This is the official implementation of paper "Leveraging Dual Process Theory in Language Agent Framework for Simultaneous Human-AI Collab…☆42Updated 2 weeks ago
- Code for ACL2024 paper - Adversarial Preference Optimization (APO).☆57Updated last year
- PyTorch implementations for Offline Preference-Based RL (PbRL) algorithms☆21Updated 7 months ago