McGill-NLP / agent-reward-benchLinks

AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories

☆40

Alternatives and similar repositories for agent-reward-bench

Users that are interested in agent-reward-bench are comparing it to the libraries listed below

Sorting:

ByteDance-Seed / WideSearch
WideSearch: Benchmarking Agentic Broad Info-Seeking
☆102Updated last month
Open-Source-O1 / o1_Reasoning_Patterns_Study
☆105Updated 11 months ago
Gen-Verse / CURE
[NeurIPS 2025 Spotlight] Co-Evolving LLM Coder and Unit Tester via Reinforcement Learning
☆135Updated 2 months ago
LAMDASZ-ML / Self-Backtracking
☆51Updated 9 months ago
bobxwu / learning-from-rewards-llm-papers
A comrephensive collection of learning from rewards in the post-training and test-time scaling of LLMs, with a focus on both reward model…
☆58Updated 5 months ago
thu-coai / SPaR
☆46Updated 5 months ago
lichengliu03 / unary-feedback
☆38Updated 3 months ago
ReasoningTransfer / Transferability-of-LLM-Reasoning
☆104Updated last month
test-time-interaction / TTI
☆65Updated 5 months ago
jwliao-ai / MARFT
☆72Updated 3 weeks ago
yyht / openrlhf_async_pipline
☆86Updated 3 months ago
satori-reasoning / Satori
[ICML 2025] Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search
☆109Updated 6 months ago
GAIR-NLP / OctoThinker
Revisiting Mid-training in the Era of Reinforcement Learning Scaling
☆180Updated 4 months ago
hzy312 / knowledge-r1
IKEA: Reinforced Internal-External Knowledge Synergistic Reasoning for Efficient Adaptive Search Agent
☆67Updated 6 months ago
efficientscaling / Z1
[EMNLP'25 Industry] Repo for "Z1: Efficient Test-time Scaling with Code"
☆67Updated 7 months ago
DualityRL / multi-attempt
☆19Updated 8 months ago
weizhepei / WebAgent-R1
[EMNLP 2025] WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning
☆60Updated last month
GAIR-NLP / PC-Agent-E
Efficient Agent Training for Computer Use
☆133Updated 3 months ago
mukhal / ThinkPRM
Process Reward Models That Think
☆63Updated this week
TsinghuaC3I / SSRL
SSRL: Self-Search Reinforcement Learning
☆157Updated 3 months ago
multimodal-art-projection / TreePO
☆50Updated last month
dvlab-research / ARPO
Official Implementation of ARPO: End-to-End Policy Optimization for GUI Agents with Experience Replay
☆138Updated 6 months ago
THU-KEG / Agentic-Reward-Modeling
[ACL 2025] Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems
☆115Updated 5 months ago
TIGER-AI-Lab / AceCoder
The official repo for "AceCoder: Acing Coder RL via Automated Test-Case Synthesis" [ACL25]
☆94Updated 7 months ago
yongchao98 / PROMST
Automatic prompt optimization framework for multi-step agent tasks.
☆36Updated last year
THUDM / T1
RL Scaling and Test-Time Scaling (ICML'25)
☆112Updated 10 months ago
GuanghaoYe / Emergence-of-Thinking
☆53Updated 9 months ago
HKUNLP / critic-rl
[ICML 2025] Teaching Language Models to Critique via Reinforcement Learning
☆118Updated 6 months ago
cs-holder / Reasoning-Self-Evolution-Survey
☆51Updated 8 months ago
YangLing0818 / SuperCorrect-llm
[ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction
☆83Updated 8 months ago