YifeiZhou02 / ArCHerLinks

Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"

☆184

Alternatives and similar repositories for ArCHer

Users that are interested in ArCHer are comparing it to the libraries listed below

Sorting:

Yifan-Song793 / ETO
Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)
☆147Updated 9 months ago
abdulhaim / LMRL-Gym
☆99Updated last year
Edward-Sun / easy-to-hard
Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision
☆123Updated 10 months ago
PRIME-RL / ImplicitPRM
Repo of paper "Free Process Rewards without Process Labels"
☆160Updated 4 months ago
YuxiXie / MCTS-DPO
This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.
☆319Updated 11 months ago
McGill-NLP / VinePPO
Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"
☆167Updated 2 months ago
liziniu / ReMax
Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement Learning Method for Aligning Large Language Models)
☆189Updated last year
jwhj / OREO
☆114Updated 6 months ago
Vance0124 / Token-level-Direct-Preference-Optimization
Reference implementation for Token-level Direct Preference Optimization(TDPO)
☆143Updated 5 months ago
WooooDyy / LLM-Reverse-Curriculum-RL
Implementation of the ICML 2024 paper "Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning" pr…
☆107Updated last year
Ber666 / RAP
Reasoning with Language Model is Planning with World Model
☆168Updated last year
Linear95 / APO
Code for ACL2024 paper - Adversarial Preference Optimization (APO).
☆56Updated last year
sanowl / Self-Correcting-LLM--Reinforcement-Learning-
This my attempt to create Self-Correcting-LLM based on the paper Training Language Models to Self-Correct via Reinforcement Learning by g…
☆35Updated 3 weeks ago
sanjibanc / agent_prm
☆43Updated 5 months ago
waterhorse1 / LLM_Tree_Search
(ICML 2024) Alphazero-like Tree-Search can guide large language model decoding and training
☆278Updated last year
haotiansun14 / AdaPlanner
AdaPlanner: Language Models for Decision Making via Adaptive Planning from Feedback
☆111Updated 4 months ago
kanishkg / cognitive-behaviors
☆203Updated 4 months ago
WeiXiongUST / Building-Math-Agents-with-Multi-Turn-Iterative-Preference-Learning
This is an official implementation of the paper ``Building Math Agents with Multi-Turn Iterative Preference Learning'' with multi-turn DP…
☆28Updated 7 months ago
genrm-star / genrm-critiques
GenRM-CoT: Data release for verification rationales
☆63Updated 9 months ago
ezelikman / STaR
Code for STaR: Bootstrapping Reasoning With Reasoning (NeurIPS 2022)
☆206Updated 2 years ago
vwxyzjn / summarize_from_feedback_details
☆147Updated 8 months ago
cmu-mind / RISE
☆32Updated 9 months ago
CMU-AIRe / MRT
Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".
☆100Updated 3 weeks ago
Berkeley-NLP / Agent-Eval-Refine
Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]
☆139Updated 8 months ago
karthikv792 / LLMs-Planning
An extensible benchmark for evaluating large language models on planning
☆393Updated last month
ars22 / scaling-LLM-math-synthetic-data
Code and data used in the paper: "Training on Incorrect Synthetic Data via RL Scales LLM Math Reasoning Eight-Fold"
☆30Updated last year
zankner / CLoud
Critique-out-Loud Reward Models
☆70Updated 9 months ago
OpenDFM / Rememberer
[NeurIPS 2023] Large Language Models Are Semi-Parametric Reinforcement Learning Agents
☆34Updated last year
xingyaoww / mint-bench
Official Repo for ICLR 2024 paper MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback by Xingyao Wang*, Ziha…
☆128Updated last year
QwenLM / ProcessBench
Official repository for ACL 2025 paper "ProcessBench: Identifying Process Errors in Mathematical Reasoning"
☆166Updated 2 months ago