waterhorse1 / LLM_Tree_SearchLinks

(ICML 2024) Alphazero-like Tree-Search can guide large language model decoding and training

☆278

Alternatives and similar repositories for LLM_Tree_Search

Users that are interested in LLM_Tree_Search are comparing it to the libraries listed below

Sorting:

YuxiXie / MCTS-DPO
This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.
☆319Updated last year
MARIO-Math-Reasoning / Super_MARIO
☆337Updated 2 months ago
ezelikman / STaR
Code for STaR: Bootstrapping Reasoning With Reasoning (NeurIPS 2022)
☆206Updated 2 years ago
YifeiZhou02 / ArCHer
Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"
☆185Updated 3 months ago
McGill-NLP / VinePPO
Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"
☆167Updated 2 months ago
expz / quiet-star
Implementation of the Quiet-STAR paper (https://arxiv.org/pdf/2403.09629.pdf)
☆54Updated last year
allenai / reward-bench
RewardBench: the first evaluation tool for reward models.
☆622Updated last month
OFA-Sys / gsm8k-ScRel
Codes and Data for Scaling Relationship on Learning Mathematical Reasoning with Large Language Models
☆268Updated 10 months ago
kanishkg / cognitive-behaviors
☆203Updated 4 months ago
THUDM / ReST-MCTS
ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search (NeurIPS 2024)
☆656Updated 6 months ago
PRIME-RL / ImplicitPRM
Repo of paper "Free Process Rewards without Process Labels"
☆160Updated 4 months ago
Ber666 / RAP
Reasoning with Language Model is Planning with World Model
☆168Updated last year
CMU-AIRe / MRT
Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".
☆100Updated 3 weeks ago
tongyx361 / Awesome-LLM4Math
Curation of resources for LLM mathematical reasoning, most of which are screened by @tongyx361 to ensure high quality and accompanied wit…
☆133Updated last year
QwenLM / ProcessBench
Official repository for ACL 2025 paper "ProcessBench: Identifying Process Errors in Mathematical Reasoning"
☆166Updated 2 months ago
OpenBMB / UltraFeedback
A large-scale, fine-grained, diverse preference dataset (and models).
☆345Updated last year
Edward-Sun / easy-to-hard
Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision
☆123Updated 10 months ago
ZubinGou / math-evaluation-harness
A simple toolkit for benchmarking LLMs on mathematical reasoning tasks. 🧮✨
☆239Updated last year
WooooDyy / LLM-Reverse-Curriculum-RL
Implementation of the ICML 2024 paper "Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning" pr…
☆107Updated last year
Linear95 / SPAG
Self-playing Adversarial Language Game Enhances LLM Reasoning, NeurIPS 2024
☆137Updated 5 months ago
liziniu / ReMax
Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement Learning Method for Aligning Large Language Models)
☆189Updated last year
allenai / FineGrainedRLHF
☆278Updated 7 months ago
eddycmu / demystify-long-cot
☆309Updated 2 months ago
vwxyzjn / summarize_from_feedback_details
☆147Updated 8 months ago
sail-sg / oat-zero
A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.
☆245Updated 3 months ago
jwhj / OREO
☆114Updated 6 months ago
karthikv792 / LLMs-Planning
An extensible benchmark for evaluating large language models on planning
☆393Updated last month
hkust-nlp / AgentBoard
An Analytical Evaluation Board of Multi-turn LLM Agents [NeurIPS 2024 Oral]
☆335Updated last year
Yifan-Song793 / ETO
Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)
☆147Updated 9 months ago
Linear95 / APO
Code for ACL2024 paper - Adversarial Preference Optimization (APO).
☆56Updated last year