WeiminXiong / MPO
MPO: Boosting LLM Agents with Meta Plan Optimization
☆51Updated 2 months ago
Alternatives and similar repositories for MPO:
Users that are interested in MPO are comparing it to the libraries listed below
- ☆47Updated 4 months ago
- Watch Every Step! LLM Agent Learning via Iterative Step-level Process Refinement (EMNLP 2024 Main Conference)☆57Updated 6 months ago
- Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems☆90Updated 2 months ago
- ☆153Updated last month
- Implementation for the research paper "Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision".☆52Updated 5 months ago
- ☆55Updated 6 months ago
- Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)☆138Updated 6 months ago
- Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".☆72Updated 2 weeks ago
- ☆45Updated 6 months ago
- ☆109Updated 3 months ago
- [ICLR 2025] Benchmarking Agentic Workflow Generation☆89Updated 2 months ago
- [ICML 2025] Teaching Language Models to Critique via Reinforcement Learning☆93Updated this week
- [NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.☆119Updated last month
- ☆121Updated this week
- ☆102Updated 5 months ago
- Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".☆93Updated last month
- ☆132Updated 2 weeks ago
- ☆42Updated 2 months ago
- Resources for our paper: "EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms"☆96Updated 6 months ago
- SimpleDeepSearcher: Deep Information Seeking via Web-Powered Reasoning Trajectory Synthesis☆41Updated 2 weeks ago
- This the implementation of LeCo☆31Updated 3 months ago
- ☆32Updated this week
- This is a repo for showcasing using MCTS with LLMs to solve gsm8k problems☆75Updated last month
- Reformatted Alignment☆115Updated 7 months ago
- MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models☆42Updated 2 months ago
- ☆122Updated 10 months ago
- A research repo for experiments about Reinforcement Finetuning☆46Updated last month
- The official repo for "AceCoder: Acing Coder RL via Automated Test-Case Synthesis"☆82Updated last month
- Implementation of the ICML 2024 paper "Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning" pr…☆99Updated last year
- ☆93Updated 3 months ago