THUDM / TreeRLLinks

TreeRL: LLM Reinforcement Learning with On-Policy Tree Search in ACL'25

☆48

Alternatives and similar repositories for TreeRL

Users that are interested in TreeRL are comparing it to the libraries listed below

Sorting:

test-time-interaction / TTI
☆53Updated last month
mathllm / Step-Controlled_DPO
☆22Updated last year
YangLing0818 / SuperCorrect-llm
[ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction
☆76Updated 4 months ago
DAMO-NLP-SG / LongPO
[ICLR 2025] LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization
☆38Updated 5 months ago
Dereck0602 / Awesome_Test_Time_LLMs
☆117Updated 4 months ago
hkust-nlp / mstar
[ICML 2025] M-STAR (Multimodal Self-Evolving TrAining for Reasoning) Project. Diving into Self-Evolving Training for Multimodal Reasoning
☆64Updated 3 weeks ago
LAMDASZ-ML / Self-Backtracking
☆47Updated 5 months ago
NuoJohnChen / JudgeLRM
JudgeLRM: Large Reasoning Models as a Judge
☆32Updated 3 months ago
HKUNLP / critic-rl
[ICML 2025] Teaching Language Models to Critique via Reinforcement Learning
☆105Updated 3 months ago
uservan / ThinkPO
☆18Updated last week
THU-KEG / AdaptThink
☆140Updated 2 months ago
ReasoningTransfer / Transferability-of-LLM-Reasoning
☆80Updated 2 weeks ago
ltzheng / SimpleTIR
End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning
☆162Updated this week
THUDM / T1
RL Scaling and Test-Time Scaling (ICML'25)
☆109Updated 6 months ago
MiroMindAsia / MiroMind-M1
MiroMind-M1 is a fully open-source series of reasoning language models built on Qwen-2.5, focused on advancing mathematical reasoning.
☆106Updated this week
Gen-Verse / CURE
Open-Source LLM Coders with Co-Evolving Reinforcement Learning
☆103Updated 2 weeks ago
john-hewitt / implicit-ins
Codebase for Instruction Following without Instruction Tuning
☆35Updated 10 months ago
sail-sg / scaling-with-vocab
[NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623
☆86Updated 10 months ago
SihengLi99 / SEALONG
Large Language Models Can Self-Improve in Long-context Reasoning
☆72Updated 8 months ago
RUCAIBox / R1-Searcher-plus
R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning
☆51Updated 2 months ago
hkust-nlp / GUIMid
☆21Updated 3 months ago
GeniusHTX / TALE
☆126Updated 2 months ago
MingyuJ666 / Disentangling-Memory-and-Reasoning
[ACL'25] We propose a novel fine-tuning method, Separate Memory and Reasoning, which combines prompt tuning with LoRA.
☆68Updated 2 weeks ago
GuanghaoYe / Emergence-of-Thinking
☆53Updated 5 months ago
GAIR-NLP / OctoThinker
Revisiting Mid-training in the Era of Reinforcement Learning Scaling
☆161Updated 2 weeks ago
open-compass / GPassK
[ACL 2025] Are Your LLMs Capable of Stable Reasoning?
☆29Updated this week
RM-R1-UIUC / RM-R1
RM-R1: Unleashing the Reasoning Potential of Reward Models
☆120Updated last month
TIGER-AI-Lab / AceCoder
The official repo for "AceCoder: Acing Coder RL via Automated Test-Case Synthesis" [ACL25]
☆87Updated 4 months ago
bobxwu / learning-from-rewards-llm-papers
A comrephensive collection of learning from rewards in the post-training and test-time scaling of LLMs, with a focus on both reward model…
☆52Updated last month
UCSB-NLP-Chang / ThinkPrune
☆39Updated 3 months ago