NovaSky-AI / SkyRLLinks
SkyRL-v0: Train Real-World Long-Horizon Agents via Reinforcement Learning
☆422Updated this week
Alternatives and similar repositories for SkyRL
Users that are interested in SkyRL are comparing it to the libraries listed below
Sorting:
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks☆219Updated last month
- ☆297Updated 3 weeks ago
- 🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.☆379Updated last week
- Code for Paper: Training Software Engineering Agents and Verifiers with SWE-Gym [ICML 2025]☆486Updated last month
- A version of verl to support tool use☆251Updated this week
- ☆220Updated 3 weeks ago
- Scalable toolkit for efficient model reinforcement☆438Updated this week
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning☆220Updated last month
- official repository for “Reinforcement Learning for Reasoning in Large Language Models with One Training Example”☆290Updated this week
- ☆773Updated last month
- Official codebase for "SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution"☆545Updated 3 months ago
- Official repo for paper: "Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't"☆236Updated last month
- A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.☆240Updated 2 months ago
- Code for the paper: "Learning to Reason without External Rewards"☆295Updated this week
- ☆169Updated this week
- Super-Efficient RLHF Training of LLMs with Parameter Reallocation☆303Updated last month
- Repo of paper "Free Process Rewards without Process Labels"☆153Updated 3 months ago
- Resources for our paper: "Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training"☆149Updated last week
- Tina: Tiny Reasoning Models via LoRA☆258Updated 3 weeks ago
- ☆233Updated 3 weeks ago
- A series of technical report on Slow Thinking with LLM☆699Updated 2 weeks ago
- Async pipelined version of Verl☆100Updated 2 months ago
- ☆119Updated last month
- Parallel Scaling Law for Language Model — Beyond Parameter and Inference Time Scaling☆395Updated last month
- Reproducing R1 for Code with Reliable Rewards☆218Updated last month
- ☆202Updated 4 months ago
- A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning☆220Updated 2 weeks ago
- ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search (NeurIPS 2024)☆639Updated 5 months ago
- RewardBench: the first evaluation tool for reward models.☆604Updated last week
- 🌍 Repository for "AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agent", ACL'24 Best Resource Pap…☆212Updated last month