☆55Feb 19, 2025Updated last year
Alternatives and similar repositories for agent_prm
Users that are interested in agent_prm are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks☆268May 5, 2025Updated last year
- Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)☆164Oct 30, 2024Updated last year
- The Unreliability of Explanations in Few-shot Prompting for Textual Reasoning (NeurIPS 2022)☆16Feb 11, 2023Updated 3 years ago
- Using Vrep to simulate a six-legged robot to do motion planning & path planning☆10Jan 10, 2019Updated 7 years ago
- [NAACL 2025] The official implementation of paper "Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language M…☆28Mar 14, 2024Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- [TMLR] Process Reward Models That Think☆89Nov 29, 2025Updated 6 months ago
- Source code for our paper: "ARIA: Training Language Agents with Intention-Driven Reward Aggregation".☆30Aug 9, 2025Updated 10 months ago
- Dump data from LG VCR HDD☆13Jan 17, 2017Updated 9 years ago
- Agentic Virtual Lab☆20Nov 30, 2025Updated 6 months ago
- ☆45May 10, 2026Updated last month
- Code repo for "Harnessing Negative Signals: Reinforcement Distillation from Teacher Data for LLM Reasoning"☆33Jul 25, 2025Updated 10 months ago
- FamilyTool benchmark☆13Sep 10, 2025Updated 9 months ago
- Resources and paper list for 'Scaling Environments for Agents'. This repository accompanies our survey on how environments contribute to …☆70Jan 28, 2026Updated 4 months ago
- official implementation of paper "Process Reward Model with Q-value Rankings"☆69Feb 5, 2025Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- [NAACL'25] Evaluating LLMs for Causal Queries☆14Feb 18, 2025Updated last year
- Demonstrating the BadAss issue.☆17May 19, 2025Updated last year
- ☆115Jan 8, 2025Updated last year
- ☆36May 29, 2025Updated last year
- [ICLR'26] Stronger-MAS: A RL Framework for multi LLM agent system; [arxiv] MetaAgent-X: End-to-End Reinforcement Learning Automatic Mult…☆185May 15, 2026Updated last month
- ☆36May 24, 2025Updated last year
- Collection of LLM completions for reasoning-gym task datasets☆31Jul 4, 2025Updated 11 months ago
- ☆11Oct 3, 2022Updated 3 years ago
- VS-Bench: Evaluating VLMs for Strategic Reasoning and Decision-Making in Multi-Agent Environments☆25Sep 30, 2025Updated 8 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- This is the official implementation of the paper "S²R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning"☆73Apr 22, 2025Updated last year
- ☆23Sep 19, 2024Updated last year
- ☆50Oct 28, 2024Updated last year
- Differentiable non-uniform interpolation: https://arxiv.org/abs/2012.13257☆11Oct 3, 2021Updated 4 years ago
- ☆10Feb 18, 2020Updated 6 years ago
- Chain-of-Thought Matters: Improving Long-Context Language Models with Reasoning Path Supervision☆19Apr 1, 2025Updated last year
- grpo to train long form QA and instructions with long-form reward model☆17Jul 17, 2025Updated 11 months ago
- Training VLM agents with multi-turn reinforcement learning☆472May 11, 2026Updated last month
- 武大信图抢座程序 支持后台持续监测,抢靠窗、有电脑的座位 以及抢座成功后自动关机☆15Dec 8, 2022Updated 3 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"☆206Apr 17, 2025Updated last year
- VehicleWorld is the first comprehensive multi-device environment for intelligent vehicle interaction that accurately models the complex, …☆22Sep 16, 2025Updated 9 months ago
- ☆28Jul 18, 2025Updated 11 months ago
- IAN: An Intelligent System for Omics Data Analysis and Discovery☆10Feb 23, 2026Updated 3 months ago
- Watch Every Step! LLM Agent Learning via Iterative Step-level Process Refinement (EMNLP 2024 Main Conference)☆68Oct 18, 2024Updated last year
- Resources for our paper: "Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training"☆172Oct 20, 2025Updated 7 months ago
- Regularly Truncated M-estimators for Learning with Noisy Labels☆11Apr 24, 2024Updated 2 years ago