sanjibanc/agent_prm

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/sanjibanc/agent_prm)

sanjibanc / agent_prm

☆60

Alternatives and similar repositories for agent_prm

Users that are interested in agent_prm are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

facebookresearch / sweet_rl
View on GitHub
Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks
☆271May 5, 2025Updated last year
mukhal / ThinkPRM
View on GitHub
[TMLR] Process Reward Models That Think
☆90Nov 29, 2025Updated 8 months ago
Yifan-Song793 / ETO
View on GitHub
Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)
☆168Oct 30, 2024Updated last year
xiye17 / TextualExplInContext
View on GitHub
The Unreliability of Explanations in Few-shot Prompting for Textual Reasoning (NeurIPS 2022)
☆16Feb 11, 2023Updated 3 years ago
xinzhel / LLM-Search
View on GitHub
Survey on LLM Inference via Search (TMLR 2025)
☆15May 6, 2025Updated last year
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
Reason-Wang / NAT
View on GitHub
[NAACL 2025] The official implementation of paper "Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language M…
☆28Mar 14, 2024Updated 2 years ago
AICourseTeamHatchet / VrepMotionPlanning
View on GitHub
Using Vrep to simulate a six-legged robot to do motion planning & path planning
☆10Jan 10, 2019Updated 7 years ago
RyanLiu112 / GenPRM
View on GitHub
[AAAI 2026] Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".
☆102Nov 8, 2025Updated 8 months ago
OpenMOSS / Embodied-Planner-R1
View on GitHub
Embodied-Planner-R1: Unleashing Embodied Task Planning Ability in LLMs via Reinforcement Learning
☆27Mar 30, 2026Updated 3 months ago
alfworld / alfworld
View on GitHub
ALFWorld: Aligning Text and Embodied Environments for Interactive Learning
☆813Feb 8, 2026Updated 5 months ago
portal-cornell / robotouille
View on GitHub
☆45May 10, 2026Updated 2 months ago
aszala / EnvGen
View on GitHub
Official Code Repository for EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents (COLM 2024)
☆40Jul 13, 2024Updated 2 years ago
WEIRDLabUW / dispo
View on GitHub
Distributional Successor Features Enable Zero-Shot Policy Optimization
☆15Apr 11, 2025Updated last year
RyanLiu112 / Awesome-Process-Reward-Models
View on GitHub
A comprehensive collection of process reward models.
☆176Jun 6, 2026Updated last month
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
ByteDance-Seed / Agent-R
View on GitHub
Resources for our paper: "Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training"
☆174Oct 20, 2025Updated 9 months ago
pollen-robotics / reachy2_mujoco_assets
View on GitHub
☆18Jan 6, 2026Updated 6 months ago
princeton-nlp / WebShop
View on GitHub
[NeurIPS 2022] 🛒WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents
☆574Sep 6, 2024Updated last year
XiaojuanTang / Mars
View on GitHub
a benchmark to evaluate the situated inductive reasoning
☆16Jan 7, 2025Updated last year
yxzwang / FamilyTool
View on GitHub
FamilyTool benchmark
☆14Sep 10, 2025Updated 10 months ago
Shangyint / langProBe
View on GitHub
☆33Jan 31, 2026Updated 5 months ago
xzhou98 / GBTL-attack
View on GitHub
☆18Jun 4, 2025Updated last year
DigiRL-agent / digiq
View on GitHub
☆121Apr 8, 2025Updated last year
asappresearch / josh-llm-simulation-training
View on GitHub
☆31Mar 3, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
WindyLee0822 / Process_Q_Model
View on GitHub
official implementation of paper "Process Reward Model with Q-value Rankings"
☆69Feb 5, 2025Updated last year
lasgroup / safe-learning
View on GitHub
A collection of algorithms and experiment tools for safe sim to real transfer in robotics.
☆28May 19, 2026Updated 2 months ago
mit-han-lab / vcpo
View on GitHub
[ICML 2026] Stable Asynchrony: Variance-Controlled Off-Policy RL for LLMs
☆29Apr 27, 2026Updated 3 months ago
vl-rewardbench / VL_RewardBench
View on GitHub
☆29Jul 23, 2025Updated last year
BAAI-WuDao / EVA
View on GitHub
☆25Sep 29, 2021Updated 4 years ago
facebookresearch / multimodal_rewardbench
View on GitHub
Multimodal RewardBench
☆68Feb 21, 2025Updated last year
rhyang2021 / ARIA
View on GitHub
Source code for our paper: "ARIA: Training Language Agents with Intention-Driven Reward Aggregation".
☆30Aug 9, 2025Updated 11 months ago
tung-nd / cwbc
View on GitHub
☆11Oct 3, 2022Updated 3 years ago
kyle8581 / Web-Shepherd
View on GitHub
[NeurIPS 2025 Spotlight] Official repository for "Web-Shepherd: Advancing PRMs for Reinforcing Web Agents"
☆58May 21, 2025Updated last year
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
robjsliwa / pyprolog
View on GitHub
Prolog implemented in Python
☆12Sep 6, 2024Updated last year
yuyq18 / StepTool
View on GitHub
☆36May 24, 2025Updated last year
brendanhogan / completion_tree_view
View on GitHub
☆15Apr 26, 2025Updated last year
varshakishore / IncDSI
View on GitHub
☆11Sep 10, 2023Updated 2 years ago
BaohaoLiao / frac-cot
View on GitHub
[COLM 2026] An efficient 3D sampling method for long-CoT LLM.
☆16May 25, 2025Updated last year
WeiminXiong / MPO
View on GitHub
MPO: Boosting LLM Agents with Meta Plan Optimization (EMNLP 2025 Findings)
☆81Aug 20, 2025Updated 11 months ago
Callione / LLaVA-MOSS2
View on GitHub
Modified LLaVA framework for MOSS2, and makes MOSS2 a multimodal model.
☆13Sep 19, 2024Updated last year