waterhorse1 / Natural-language-RLLinks
Natural Language Reinforcement Learning
☆92Updated last week
Alternatives and similar repositories for Natural-language-RL
Users that are interested in Natural-language-RL are comparing it to the libraries listed below
Sorting:
- MARFT stands for Multi-Agent Reinforcement Fine-Tuning. This repository implements an LLM-based multi-agent reinforcement fine-tuning fra…☆53Updated 3 weeks ago
- ☆114Updated 6 months ago
- Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]☆139Updated 8 months ago
- ☆47Updated 5 months ago
- Interpretable Contrastive Monte Carlo Tree Search Reasoning☆48Updated 8 months ago
- Implementation of the ICML 2024 paper "Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning" pr…☆107Updated last year
- Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)☆147Updated 9 months ago
- RL Scaling and Test-Time Scaling (ICML'25)☆109Updated 6 months ago
- B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners☆82Updated 2 months ago
- [ICML 2025] Flow of Reasoning: Training LLMs for Divergent Reasoning with Minimal Examples☆103Updated last week
- ☆53Updated 5 months ago
- ☆32Updated 9 months ago
- official implementation of paper "Process Reward Model with Q-value Rankings"☆60Updated 5 months ago
- Official Implementation of ARPO: End-to-End Policy Optimization for GUI Agents with Experience Replay☆99Updated 2 months ago
- This is the official implementation of the paper "S²R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning"☆69Updated 3 months ago
- Reinforced Multi-LLM Agents training☆32Updated last month
- ☆47Updated last month
- ☆60Updated 4 months ago
- ☆61Updated last week
- Sotopia-π: Interactive Learning of Socially Intelligent Language Agents (ACL 2024)☆70Updated last year
- Open-Source LLM Coders with Co-Evolving Reinforcement Learning☆103Updated 2 weeks ago
- Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"☆184Updated 3 months ago
- ☆43Updated 5 months ago
- [ICLR 2024] Trajectory-as-Exemplar Prompting with Memory for Computer Control☆59Updated 6 months ago
- Repo of paper "Free Process Rewards without Process Labels"☆160Updated 4 months ago
- End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning☆158Updated last week
- Code for "Reasoning to Learn from Latent Thoughts"☆114Updated 4 months ago
- Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".☆100Updated 3 weeks ago
- ☆99Updated last year
- [ACL 2024] The project of Symbol-LLM☆56Updated last year