waterhorse1 / Natural-language-RL
Natural Language Reinforcement Learning
☆72Updated 2 months ago
Alternatives and similar repositories for Natural-language-RL:
Users that are interested in Natural-language-RL are comparing it to the libraries listed below
- Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)☆116Updated 3 months ago
- Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]☆125Updated 2 months ago
- Code and data used in the paper: "Training on Incorrect Synthetic Data via RL Scales LLM Math Reasoning Eight-Fold"☆29Updated 8 months ago
- Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"☆120Updated 3 months ago
- official implementation of paper "Process Reward Model with Q-value Rankings"☆48Updated 2 weeks ago
- ☆92Updated last month
- Watch Every Step! LLM Agent Learning via Iterative Step-level Process Refinement (EMNLP 2024 Main Conference)☆52Updated 4 months ago
- Implementation of the ICML 2024 paper "Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning" pr…☆90Updated last year
- ☆78Updated 7 months ago
- [NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.☆95Updated 4 months ago
- "Improving Mathematical Reasoning with Process Supervision" by OPENAI☆103Updated last week
- AdaPlanner: Language Models for Decision Making via Adaptive Planning from Feedback☆102Updated last year
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision☆115Updated 5 months ago
- Code for paper "Optima: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System"☆52Updated 3 months ago
- Interpretable Contrastive Monte Carlo Tree Search Reasoning☆42Updated 3 months ago
- Repo of paper "Free Process Rewards without Process Labels"☆123Updated last month
- [ICLR 2024] Trajectory-as-Exemplar Prompting with Memory for Computer Control☆55Updated last month
- B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners☆72Updated last month
- Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"☆131Updated 10 months ago
- [ACL 2023] Solving Math Word Problems via Cooperative Reasoning induced Language Models (LLMs + MCTS + Self-Improvement)☆48Updated last year
- [ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"☆54Updated 11 months ago
- Code and models for EMNLP 2024 paper "WPO: Enhancing RLHF with Weighted Preference Optimization"☆37Updated 4 months ago
- [ACL 2024] The project of Symbol-LLM☆47Updated 7 months ago
- GenRM-CoT: Data release for verification rationales☆46Updated 4 months ago
- ☆22Updated 8 months ago
- Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"☆46Updated last year
- Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]☆130Updated 5 months ago
- ☆132Updated 2 months ago
- ☆46Updated last week
- NeurIPS 2024 tutorial on LLM Inference☆39Updated 2 months ago