JingMog / THORLinks

Official Implementation of our paper "THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning".

☆27

Alternatives and similar repositories for THOR

Users that are interested in THOR are comparing it to the libraries listed below

Sorting:

hkust-nlp / model-task-align-rl
The official code repository for the paper "Mirage or Method? How Model–Task Alignment Induces Divergent RL Conclusions".
☆15Updated 2 months ago
yayayacc / MUR
☆45Updated last month
jinzhuoran / RAG-RewardBench
RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment
☆16Updated 10 months ago
multimodal-art-projection / TreePO
☆44Updated last month
uservan / ThinkPO
☆17Updated 3 months ago
YujunZhou / EVOL-RL
Code for Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation (EVOL-RL).
☆39Updated 3 weeks ago
mathllm / Step-Controlled_DPO
☆23Updated last year
yuleiqin / RAIF
A Recipe for Building LLM Reasoners to Solve Complex Instructions
☆27Updated last month
aeroplanepaper / GRPO-LEAD
☆30Updated last month
rhyang2021 / ARIA
Source code for our paper: "ARIA: Training Language Agents with Intention-Driven Reward Aggregation".
☆24Updated 2 months ago
opendatalab / REST
☆32Updated 3 months ago
NuoJohnChen / JudgeLRM
JudgeLRM: Large Reasoning Models as a Judge
☆40Updated last month
lichengliu03 / unary-feedback
☆38Updated 2 months ago
TIGER-AI-Lab / Hierarchical-Reasoner
Emergent Hierarchical Reasoning in LLMs/VLMs through Reinforcement Learning
☆46Updated 2 weeks ago
EvanZhuang / AgenticLU
Official implementation of Self-Taught Agentic Long Context Understanding (ACL 2025).
☆10Updated last month
sail-sg / feedback-conditional-policy
Code for "Language Models Can Learn from Verbal Feedback Without Scalar Rewards"
☆51Updated last month
kokolerk / TON
[NeurIPS 2025] Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models
☆47Updated last month
LCM-Lab / LOGO
Code for paper: Long cOntext aliGnment via efficient preference Optimization
☆23Updated 3 weeks ago
RUCKBReasoning / CodeRM
Official code implementation for the ACL 2025 paper: 'Dynamic Scaling of Unit Tests for Code Reward Modeling'
☆25Updated 5 months ago
open-compass / GPassK
[ACL 2025] Are Your LLMs Capable of Stable Reasoning?
☆31Updated 3 months ago
LAMDASZ-ML / Self-Backtracking
☆50Updated 8 months ago
czg1225 / dParallel
dParallel: Learnable Parallel Decoding for dLLMs
☆38Updated 3 weeks ago
hkust-nlp / RL-Verifier-Robustness
From Accuracy to Robustness: A Study of Rule- and Model-based Verifiers in Mathematical Reasoning.
☆23Updated last month
NUS-TRAIL / RAPID
☆17Updated 8 months ago
yale-nlp / refdpo
☆16Updated last year
tianyi-lab / C3PO
[COLM 2025] "C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing"
☆18Updated 7 months ago
MDI-Benchmark / MDI-Benchmark
☆14Updated 10 months ago
xufangzhi / Genius
[ACL 2025] A Generalizable and Purely Unsupervised Self-Training Framework
☆71Updated 5 months ago
KejiaZhang-Robust / TARS
TARS: MinMax Token-Adaptive Preference Strategy for Hallucination Reduction in MLLMs
☆23Updated last month
euclid-multimodal / Euclid
☆17Updated 10 months ago