YifeiZhou02 / ArCHer
Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"
☆167Updated 2 weeks ago
Alternatives and similar repositories for ArCHer:
Users that are interested in ArCHer are comparing it to the libraries listed below
- ☆91Updated 10 months ago
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision☆120Updated 7 months ago
- Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)☆138Updated 6 months ago
- Code for ACL2024 paper - Adversarial Preference Optimization (APO).☆54Updated 11 months ago
- ☆109Updated 3 months ago
- Reference implementation for Token-level Direct Preference Optimization(TDPO)☆138Updated 2 months ago
- Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"☆153Updated 5 months ago
- Implementation of the ICML 2024 paper "Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning" pr…☆99Updated last year
- This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.☆306Updated 9 months ago
- [NeurIPS 2023] Large Language Models Are Semi-Parametric Reinforcement Learning Agents☆35Updated last year
- GenRM-CoT: Data release for verification rationales☆59Updated 6 months ago
- AdaPlanner: Language Models for Decision Making via Adaptive Planning from Feedback☆107Updated last month
- ☆137Updated 5 months ago
- (ICML 2024) Alphazero-like Tree-Search can guide large language model decoding and training☆266Updated 11 months ago
- ☆121Updated this week
- Reasoning with Language Model is Planning with World Model☆164Updated last year
- SmartPlay is a benchmark for Large Language Models (LLMs). Uses a variety of games to test various important LLM capabilities as agents. …☆136Updated last year
- Repo of paper "Free Process Rewards without Process Labels"☆145Updated last month
- ☆30Updated 6 months ago
- ☆163Updated last month
- Implementation for the research paper "Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision".☆52Updated 5 months ago
- ☆150Updated 4 months ago
- ☆64Updated 5 months ago
- This my attempt to create Self-Correcting-LLM based on the paper Training Language Models to Self-Correct via Reinforcement Learning by g…☆35Updated last month
- Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]☆135Updated 5 months ago
- Critique-out-Loud Reward Models☆64Updated 6 months ago
- ☆132Updated 4 months ago
- [NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.☆118Updated last month
- Natural Language Reinforcement Learning☆87Updated 4 months ago
- ☆142Updated last year