PRIME-RL / TTRLLinks
TTRL: Test-Time Reinforcement Learning
☆806Updated last month
Alternatives and similar repositories for TTRL
Users that are interested in TTRL are comparing it to the libraries listed below
Sorting:
- An Open-source RL System from ByteDance Seed and Tsinghua AIR☆1,545Updated 4 months ago
- A Survey of Reinforcement Learning for Large Reasoning Models☆1,044Updated last week
- ✨ Agentic Reinforced Policy Optimization☆586Updated last week
- Understanding R1-Zero-Like Training: A Critical Perspective☆1,085Updated 3 weeks ago
- Official codebase for "Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling".☆271Updated 7 months ago
- Large Reasoning Models☆805Updated 9 months ago
- Official Repository of "Learning to Reason under Off-Policy Guidance"☆301Updated last week
- Official repo for paper: "Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't"☆261Updated 4 months ago
- [TMLR 2025] Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models☆609Updated this week
- ReasonFlux Series - ReasonFlux, ReasonFlux-PRM and ReasonFlux-Coder☆485Updated last month
- A version of verl to support diverse tool use☆517Updated this week
- Explore the Multimodal “Aha Moment” on 2B Model☆608Updated 6 months ago
- A series of technical report on Slow Thinking with LLM☆729Updated last month
- ☆393Updated 2 weeks ago
- ☆215Updated last week
- official repository for “Reinforcement Learning for Reasoning in Large Language Models with One Training Example”☆357Updated last week
- ☆1,233Updated last week
- verl-agent is an extension of veRL, designed for training LLM/VLM agents via RL. verl-agent is also the official code for paper "Group-in…☆888Updated this week
- Code for the paper: "Learning to Reason without External Rewards"☆353Updated 2 months ago
- Official Repo for Open-Reasoner-Zero☆2,039Updated 3 months ago
- ☆287Updated 3 months ago
- ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search (NeurIPS 2024)☆663Updated 7 months ago
- [COLM 2025] LIMO: Less is More for Reasoning☆1,018Updated last month
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning☆253Updated 4 months ago
- Latest Advances on Long Chain-of-Thought Reasoning☆504Updated 2 months ago
- Official Implementation for the paper "d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning"☆308Updated 2 months ago
- A MemAgent framework that can be extrapolated to 3.5M, along with a training framework for RL training of any agent workflow.☆665Updated last month
- ☆685Updated last week
- The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning.☆326Updated 2 months ago
- Official implementation of paper: SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training☆292Updated 4 months ago