mzf666 / MATPOLinks
Official implementation of MATPO: Multi-Agent Tool-Integrated Policy Optimization.
β66Updated 2 months ago
Alternatives and similar repositories for MATPO
Users that are interested in MATPO are comparing it to the libraries listed below
Sorting:
- π§Tool-Star: Empowering LLM-brained Multi-Tool Reasoner via Reinforcement Learningβ305Updated last week
- [NeurIPS 2024] Official code of $\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$β50Updated last year
- β176Updated last month
- The implementation for ICLR 2025 Oral: From Exploration to Mastery: Enabling LLMs to Master Tools via Self-Driven Interactions.β52Updated 5 months ago
- [ACL'25] We propose a novel fine-tuning method, Separate Memory and Reasoning, which combines prompt tuning with LoRA.β82Updated 2 months ago
- β41Updated 4 months ago
- β299Updated 6 months ago
- OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuningβ155Updated last year
- β111Updated 6 months ago
- This is the official GitHub repository for our survey paper "Beyond Single-Turn: A Survey on Multi-Turn Interactions with Large Language β¦β161Updated 7 months ago
- A research repo for experiments about Reinforcement Finetuningβ53Updated 9 months ago
- A comrephensive collection of learning from rewards in the post-training and test-time scaling of LLMs, with a focus on both reward modelβ¦β60Updated 6 months ago
- [NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.β134Updated 9 months ago
- β28Updated last year
- RM-R1: Unleashing the Reasoning Potential of Reward Modelsβ156Updated 6 months ago
- β213Updated 7 months ago
- Official code implementation for the ACL 2025 paper: 'CoT-based Synthesizer: Enhancing LLM Performance through Answer Synthesis'β32Updated 7 months ago
- β46Updated 2 weeks ago
- A comprehensive collection of process reward models.β131Updated 3 months ago
- π Awesome Agentic Search is a curated list of papers, tools, and resources on agentic searchβwhere AI agents plan, search, and reason toβ¦β50Updated 4 months ago
- β404Updated 2 months ago
- β326Updated 7 months ago
- A comprehensive paper list of Table-based Question Answering.β34Updated 2 years ago
- Official code for paper "SPA-RL: Reinforcing LLM Agent via Stepwise Progress Attribution"β61Updated 3 months ago
- [arxiv: 2505.02156] Adaptive Thinking via Mode Policy Optimization for Social Language Agentsβ46Updated 6 months ago
- Our code for ICLR'25 paper "DataMan: Data Manager for Pre-training Large Language Models".β111Updated 4 months ago
- β70Updated 6 months ago
- xVerify: Efficient Answer Verifier for Reasoning Model Evaluationsβ142Updated last month
- Official Implementation for EMNLP 2024 (main) "AgentReview: Exploring Academic Peer Review with LLM Agent."β101Updated last year
- Code implementation of synthetic continued pretrainingβ145Updated last year