YifeiZhou02 / ArCHer
Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"
☆160Updated last year
Alternatives and similar repositories for ArCHer:
Users that are interested in ArCHer are comparing it to the libraries listed below
- Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)☆133Updated 5 months ago
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision☆120Updated 7 months ago
- ☆105Updated 2 months ago
- ☆87Updated 9 months ago
- Natural Language Reinforcement Learning☆86Updated 3 months ago
- Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"☆150Updated 5 months ago
- This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.☆301Updated 8 months ago
- Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement Learning Method for Aligning Large Language Models)☆181Updated last year
- Repo of paper "Free Process Rewards without Process Labels"☆140Updated last month
- Reference implementation for Token-level Direct Preference Optimization(TDPO)☆133Updated 2 months ago
- Code for ACL2024 paper - Adversarial Preference Optimization (APO).☆53Updated 10 months ago
- Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]☆132Updated 4 months ago
- Implementation of the ICML 2024 paper "Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning" pr…☆97Updated last year
- AdaPlanner: Language Models for Decision Making via Adaptive Planning from Feedback☆106Updated 2 weeks ago
- (ICML 2024) Alphazero-like Tree-Search can guide large language model decoding and training☆264Updated 10 months ago
- Reasoning with Language Model is Planning with World Model☆163Updated last year
- ☆137Updated 4 months ago
- ☆142Updated 11 months ago
- GenRM-CoT: Data release for verification rationales☆56Updated 6 months ago
- Interpretable Contrastive Monte Carlo Tree Search Reasoning☆48Updated 5 months ago
- Critique-out-Loud Reward Models☆57Updated 5 months ago
- This is an official implementation of the paper ``Building Math Agents with Multi-Turn Iterative Preference Learning'' with multi-turn DP…☆25Updated 4 months ago
- Watch Every Step! LLM Agent Learning via Iterative Step-level Process Refinement (EMNLP 2024 Main Conference)☆57Updated 5 months ago
- ☆30Updated 5 months ago
- [NeurIPS'24] Official code for *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*☆101Updated 4 months ago
- Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]☆131Updated 6 months ago
- ☆148Updated 4 months ago
- ☆55Updated last month
- Domain-specific preference (DSP) data and customized RM fine-tuning.☆25Updated last year
- Implementation for the research paper "Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision".☆52Updated 4 months ago