SiliangZeng / Multi-Turn-RL-AgentView external linksLinks
☆113Jun 11, 2025Updated 8 months ago
Alternatives and similar repositories for Multi-Turn-RL-Agent
Users that are interested in Multi-Turn-RL-Agent are comparing it to the libraries listed below
Sorting:
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks☆261May 5, 2025Updated 9 months ago
- the datasets of our paper☆11Feb 26, 2024Updated last year
- ☆13Aug 4, 2025Updated 6 months ago
- ☆17May 3, 2025Updated 9 months ago
- SeRL: Self-Play Reinforcement Learning for Large Language Models with Limited Data☆21Jan 24, 2026Updated 3 weeks ago
- ☆283Aug 12, 2025Updated 6 months ago
- EMNLP MAIN 2025 StepSearch: Igniting LLMs Search Ability via Step-Wise Proximal Policy Optimization☆59Sep 13, 2025Updated 5 months ago
- Official repository for "RLVR-World: Training World Models with Reinforcement Learning" (NeurIPS 2025), https://arxiv.org/abs/2505.13934☆214Oct 28, 2025Updated 3 months ago
- ☆39Jul 25, 2024Updated last year
- ☆20Apr 24, 2025Updated 9 months ago
- Code for Blog Post: Can Better Cold-Start Strategies Improve RL Training for LLMs?☆19Mar 9, 2025Updated 11 months ago
- Entropy Based Sampling and Parallel CoT Decoding☆17Oct 9, 2024Updated last year
- A holistic benchmark for LLM abstention☆69Aug 27, 2025Updated 5 months ago
- Code for the paper: Dense Reward for Free in Reinforcement Learning from Human Feedback (ICML 2024) by Alex J. Chan, Hao Sun, Samuel Holt…☆38Aug 11, 2024Updated last year
- ☆118Feb 4, 2026Updated last week
- Code for the paper "RADAR: Enhancing Radiology Report Generation with Supplementary Knowledge Injection" (ACL'25).☆33Jul 23, 2025Updated 6 months ago
- ☆31Aug 7, 2025Updated 6 months ago
- Code for the paper "ICON: Improving Inter-Report Consistency in Radiology Report Generation via Lesion-aware Mixup Augmentation" (EMNLP'2…☆17Dec 11, 2024Updated last year
- ☆271Jan 29, 2026Updated 2 weeks ago
- ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning & ReCall: Learning to Reason with Tool Call for LLMs via Rei…