schinger / AlphaZero
Simplest AlphaZero Implementation
☆16Updated 6 months ago
Alternatives and similar repositories for AlphaZero
Users that are interested in AlphaZero are comparing it to the libraries listed below
Sorting:
- This is a repo for showcasing using MCTS with LLMs to solve gsm8k problems☆76Updated last month
- Full stack LLM (Pre-training/finetuning, PPO(RLHF), Inference, Quant, etc.)☆19Updated 2 months ago
- ☆34Updated this week
- Code for ACL2024 paper - Adversarial Preference Optimization (APO).☆54Updated 11 months ago
- Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".☆72Updated 2 weeks ago
- [NeurIPS'24] Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models☆58Updated 5 months ago
- Curation of resources for LLM research, screened by @tongyx361 to ensure high quality and accompanied with elaborately-written concise de…☆53Updated 10 months ago
- Pretrain、decay、SFT a CodeLLM from scratch 🧙♂️☆35Updated 11 months ago
- [ACL 2023] Solving Math Word Problems via Cooperative Reasoning induced Language Models (LLMs + MCTS + Self-Improvement)☆49Updated last year
- ☆63Updated 5 months ago
- Conic10K: A large-scale dataset for closed-vocabulary math problem understanding. Accepted to EMNLP2023 Findings.☆25Updated last year
- exploring whether LLMs perform case-based or rule-based reasoning☆28Updated last year
- RLHF experiments on a single A100 40G GPU. Support PPO, GRPO, REINFORCE, RAFT, RLOO, ReMax, DeepSeek R1-Zero reproducing.☆58Updated 2 months ago
- ☆47Updated 4 months ago
- ☆32Updated 5 months ago
- Enhances Overleaf by allowing article searches and BibTeX retrieval from DBLP and Google Scholar | 通过允许从 DBLP 和 Google Scholar 进行文章搜索和获取 …☆68Updated last month
- Domain-specific preference (DSP) data and customized RM fine-tuning.☆25Updated last year
- Implementation of the ICML 2024 paper "Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning" pr…☆99Updated last year
- ☆45Updated 6 months ago
- The implementation of paper "LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Fee…☆39Updated 9 months ago
- Watch Every Step! LLM Agent Learning via Iterative Step-level Process Refinement (EMNLP 2024 Main Conference)☆57Updated 6 months ago
- The official repository of "Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint"☆38Updated last year
- ☆30Updated 6 months ago
- Official completion of “Training on the Benchmark Is Not All You Need”.☆31Updated 4 months ago
- MPO: Boosting LLM Agents with Meta Plan Optimization☆51Updated 2 months ago
- ☆102Updated 5 months ago
- Trinity-RFT is a general-purpose, flexible and scalable framework designed for reinforcement fine-tuning (RFT) of large language models (…☆77Updated this week
- On Memorization of Large Language Models in Logical Reasoning☆64Updated last month
- Knowledge-Reasoning Synergy Reinforcement Learning.☆35Updated 2 months ago
- ☆50Updated 3 months ago