thunlp / JustRLLinks
☆185Updated last week
Alternatives and similar repositories for JustRL
Users that are interested in JustRL are comparing it to the libraries listed below
Sorting:
- Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning☆190Updated 9 months ago
- [NeurIPS 2025] The official repo of SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond☆187Updated 6 months ago
- ☆346Updated 5 months ago
- repo for paper https://arxiv.org/abs/2504.13837☆310Updated 3 weeks ago
- ☆118Updated last month
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning☆257Updated 7 months ago
- A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning☆281Updated 3 months ago
- Chain of Experts (CoE) enables communication between experts within Mixture-of-Experts (MoE) models☆228Updated 2 months ago
- The official repository of paper "Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models''☆110Updated 4 months ago
- Towards a Unified View of Large Language Model Post-Training☆199Updated 4 months ago
- General Reasoner: Advancing LLM Reasoning Across All Domains [NeurIPS25]☆210Updated last month
- A highly capable 2.4B lightweight LLM using only 1T pre-training data with all details.☆222Updated 5 months ago
- [ICML 2025] |TokenSwift: Lossless Acceleration of Ultra Long Sequence Generation☆118Updated 7 months ago
- xVerify: Efficient Answer Verifier for Reasoning Model Evaluations☆142Updated last month
- End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning☆348Updated 3 months ago
- ☆215Updated 10 months ago
- Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme☆146Updated 9 months ago
- ☆84Updated 9 months ago
- MiroMind-M1 is a fully open-source series of reasoning language models built on Qwen-2.5, focused on advancing mathematical reasoning.☆247Updated 4 months ago
- ☆109Updated 3 months ago
- [Preprint] RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments☆165Updated last month
- Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs☆198Updated last month
- A set of examples based on verl for end-to-end RL training recipes.☆108Updated this week
- Official repo for paper: "Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't"☆271Updated 2 months ago
- Revisiting Mid-training in the Era of Reinforcement Learning Scaling☆182Updated 5 months ago
- Parallel Scaling Law for Language Model — Beyond Parameter and Inference Time Scaling☆467Updated 7 months ago
- Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".☆114Updated 5 months ago
- [COLM 2025] Code for Paper: Learning Adaptive Parallel Reasoning with Language Models☆136Updated 3 weeks ago
- Code for the paper: "Learning to Reason without External Rewards"☆385Updated 6 months ago
- An efficient GRPO training util.☆50Updated 6 months ago