mzf666 / MATPOLinks
Official implementation of MATPO: Multi-Agent Tool-Integrated Policy Optimization.
β70Updated 3 months ago
Alternatives and similar repositories for MATPO
Users that are interested in MATPO are comparing it to the libraries listed below
Sorting:
- π§Tool-Star: Empowering LLM-brained Multi-Tool Reasoner via Reinforcement Learningβ312Updated 3 weeks ago
- β303Updated 6 months ago
- β421Updated 3 months ago
- A research repo for experiments about Reinforcement Finetuningβ53Updated 9 months ago
- β332Updated 8 months ago
- MemGen: Weaving Generative Latent Memory for Self-Evolving Agentsβ290Updated 2 months ago
- A comprehensive collection of process reward models.β134Updated 3 months ago
- β177Updated last month
- Awesome-Long2short-on-LRMs is a collection of state-of-the-art, novel, exciting long2short methods on large reasoning models. It containsβ¦β258Updated 5 months ago
- [ICLR 2026] End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoningβ351Updated 2 weeks ago
- This is the repository of DEER, a Dynamic Early Exit in Reasoning method for Large Reasoning Language Models.β179Updated 6 months ago
- OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuningβ155Updated last year
- The implementation for ICLR 2025 Oral: From Exploration to Mastery: Enabling LLMs to Master Tools via Self-Driven Interactions.β52Updated 5 months ago
- π A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, Agent, and Beyondβ340Updated last week
- This is the official GitHub repository for our survey paper "Beyond Single-Turn: A Survey on Multi-Turn Interactions with Large Language β¦β169Updated 8 months ago
- REverse-Engineered Reasoning for Open-Ended Generationβ89Updated 4 months ago
- π Awesome Agentic Search is a curated list of papers, tools, and resources on agentic searchβwhere AI agents plan, search, and reason toβ¦β52Updated 5 months ago
- [NeurIPS 2024] Official code of $\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$β50Updated last year
- β48Updated last month
- Scaling Agentic Reinforcement Learning with a Multi-Turn, Multi-Task Frameworkβ196Updated 2 weeks ago
- [ACL'25] We propose a novel fine-tuning method, Separate Memory and Reasoning, which combines prompt tuning with LoRA.β83Updated 2 months ago
- Official Repository of "Learning to Reason under Off-Policy Guidance"β406Updated 3 months ago
- The official code of ARPO & AEPOβ872Updated 3 weeks ago
- Official code implementation for the ACL 2025 paper: 'CoT-based Synthesizer: Enhancing LLM Performance through Answer Synthesis'β32Updated 8 months ago
- β57Updated 7 months ago
- Implementation for the research paper "Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision".β55Updated last year
- β489Updated 3 months ago
- β223Updated 3 weeks ago
- The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning.β414Updated 6 months ago
- [AAAI 2026] Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".β95Updated 2 months ago