JingMog / THORLinks
Official Implementation of our paper "THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning".
☆23Updated this week
Alternatives and similar repositories for THOR
Users that are interested in THOR are comparing it to the libraries listed below
Sorting:
- The official code repository for the paper "Mirage or Method? How Model–Task Alignment Induces Divergent RL Conclusions".☆12Updated 2 weeks ago
- RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment☆16Updated 9 months ago
- ☆18Updated last month
- ☆10Updated 5 months ago
- ☆45Updated last week
- JudgeLRM: Large Reasoning Models as a Judge☆38Updated this week
- Official code implementation for the ACL 2025 paper: 'Dynamic Scaling of Unit Tests for Code Reward Modeling'☆25Updated 4 months ago
- ☆22Updated last year
- Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models☆41Updated last month
- ☆24Updated this week
- ☆16Updated last year
- ☆47Updated 7 months ago
- ☆34Updated 3 weeks ago
- ☆14Updated 9 months ago
- MathFusion: Enhancing Mathematical Problem-solving of LLM through Instruction Fusion (ACL 2025)☆29Updated 2 months ago
- ☆36Updated last month
- Code for "C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing"☆18Updated 5 months ago
- [ACL 2025] Are Your LLMs Capable of Stable Reasoning?☆30Updated last month
- A Recipe for Building LLM Reasoners to Solve Complex Instructions☆22Updated last month
- Official implementation of Self-Taught Agentic Long Context Understanding (ACL 2025).☆10Updated 2 months ago
- Source code for our paper: "ARIA: Training Language Agents with Intention-Driven Reward Aggregation".☆22Updated last month
- [ACL 2025 Findings] Official implementation of the paper "Unveiling the Key Factors for Distilling Chain-of-Thought Reasoning".☆19Updated 6 months ago
- Optimizing Anytime Reasoning via Budget Relative Policy Optimization☆46Updated 2 months ago
- [EMNLP 2024 Findings] ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs☆29Updated 4 months ago
- ☆18Updated 9 months ago
- Emergent Hierarchical Reasoning in LLMs/VLMs through Reinforcement Learning☆23Updated last week
- ☆19Updated 6 months ago
- A comrephensive collection of learning from rewards in the post-training and test-time scaling of LLMs, with a focus on both reward model…☆56Updated 3 months ago
- Code for paper "Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models."☆46Updated 11 months ago
- Control LLM☆19Updated 5 months ago