dongguanting / Tool-StarLinks

🔧Tool-Star: Empowering LLM-brained Multi-Tool Reasoner via Reinforcement Learning

☆236

Alternatives and similar repositories for Tool-Star

Users that are interested in Tool-Star are comparing it to the libraries listed below

Sorting:

qiancheng0 / ToolRL
☆317Updated 2 months ago
GAIR-NLP / ToRL
☆271Updated 3 months ago
ADaM-BJTU / OpenRFT
OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning
☆148Updated 8 months ago
LightChen233 / reasoning-boundary
☆67Updated 2 months ago
IAAR-Shanghai / xVerify
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations
☆128Updated 4 months ago
RyanLiu112 / GenPRM
Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".
☆81Updated 2 months ago
ElliottYan / LUFFY
Official Repository of "Learning to Reason under Off-Policy Guidance"
☆282Updated last month
bruno686 / Awesome-Agent-Training
Awesome Agent Training
☆213Updated 2 weeks ago
dongguanting / ARPO
The official code of “Agentic Reinforced Policy Optimization”, an agentic RL algorithm optimization.
☆482Updated last week
TsinghuaC3I / MARTI
A Framework for LLM-based Multi-Agent Reinforced Training and Inference
☆208Updated this week
OpenBMB / RLPR
Extrapolating RLVR to General Domains without Verifiers
☆140Updated last week
THU-KEG / AdaptThink
☆144Updated 2 months ago
0russwest0 / Awesome-Agent-RL
☆349Updated last week
cmu-l3 / l1
L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning
☆248Updated 3 months ago
dongguanting / FollowRAG
The demo, code and data of FollowRAG
☆74Updated last month
InternLM / POLAR
Pre-trained, Scalable, High-performance Reward Models via Policy Discriminative Learning.
☆147Updated last month
RyanLiu112 / Awesome-Process-Reward-Models
A comprehensive collection of process reward models.
☆104Updated last month
MingyuJ666 / Disentangling-Memory-and-Reasoning
[ACL'25] We propose a novel fine-tuning method, Separate Memory and Reasoning, which combines prompt tuning with LoRA.
☆73Updated last month
TIGER-AI-Lab / verl-tool
A version of verl to support tool use
☆333Updated this week
WooooDyy / MathCritique
Implementation for the research paper "Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision".
☆56Updated 8 months ago
ReTool-RL / ReTool
☆187Updated last week
OPPO-PersonalAI / Agent_Foundation_Models
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL.
☆36Updated this week
ADaM-BJTU / AutoCoA
AutoCoA (Automatic generation of Chain-of-Action) is an agent model framework that enhances the multi-turn tool usage capability of reaso…
☆124Updated 5 months ago
CJReinforce / PURE
Official code for the paper, "Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning"
☆133Updated last month
zjunlp / WorfBench
[ICLR 2025] Benchmarking Agentic Workflow Generation
☆118Updated 6 months ago
bobxwu / learning-from-rewards-llm-papers
A comrephensive collection of learning from rewards in the post-training and test-time scaling of LLMs, with a focus on both reward model…
☆53Updated 2 months ago
open-compass / GTA
[NeurIPS 2024 D&B Track] GTA: A Benchmark for General Tool Agents
☆118Updated 4 months ago
QingyangZhang / Label-Free-RLVR
☆261Updated last month
PRIME-RL / Entropy-Mechanism-of-RL
The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning.
☆296Updated last month
RUCAIBox / SimpleDeepSearcher
SimpleDeepSearcher: Deep Information Seeking via Web-Powered Reasoning Trajectory Synthesis
☆97Updated 2 months ago