vsubramaniam851 / multiagent-ft
☆196Updated 2 months ago
Alternatives and similar repositories for multiagent-ft:
Users that are interested in multiagent-ft are comparing it to the libraries listed below
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks☆182Updated last week
- TTRL: Test-Time Reinforcement Learning☆166Updated this week
- Resources for our paper: "Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training"☆122Updated last month
- ☆107Updated 3 months ago
- Resources for our paper: "EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms"☆93Updated 6 months ago
- Code for "Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate"☆139Updated this week
- Official repo for paper: "Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't"☆204Updated last month
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning☆191Updated last month
- "Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents"☆68Updated 2 weeks ago
- Flow of Reasoning: Training LLMs for Divergent Problem Solving with Minimal Examples☆84Updated last month
- ☆81Updated this week
- AWM: Agent Workflow Memory☆262Updated 2 months ago
- What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective☆63Updated last month
- official implementation of paper "Process Reward Model with Q-value Rankings"☆56Updated 2 months ago
- Repo of paper "Free Process Rewards without Process Labels"☆143Updated last month
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆171Updated 3 months ago
- ☆283Updated last month
- ☆157Updated 3 weeks ago
- Code and example data for the paper: Rule Based Rewards for Language Model Safety☆186Updated 9 months ago
- [ICLR 2025] Benchmarking Agentic Workflow Generation☆79Updated 2 months ago
- Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems☆86Updated last month
- ☆91Updated 2 months ago
- ☆70Updated 5 months ago
- Repo for "Z1: Efficient Test-time Scaling with Code"☆55Updated 2 weeks ago
- [ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?☆63Updated last month
- Benchmarking LLMs with Challenging Tasks from Real Users☆221Updated 5 months ago
- Official implementation of paper "On the Diagram of Thought" (https://arxiv.org/abs/2409.10038)☆178Updated 3 weeks ago
- Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]☆134Updated 5 months ago
- Code for "Reasoning to Learn from Latent Thoughts"☆91Updated 3 weeks ago
- Code for the paper 🌳 Tree Search for Language Model Agents☆194Updated 9 months ago