thunlp / JustRLLinks
☆68Updated last month
Alternatives and similar repositories for JustRL
Users that are interested in JustRL are comparing it to the libraries listed below
Sorting:
- ☆86Updated 4 months ago
- xVerify: Efficient Answer Verifier for Reasoning Model Evaluations☆141Updated last month
- ☆85Updated 8 months ago
- ☆208Updated last month
- [ICML 2025] |TokenSwift: Lossless Acceleration of Ultra Long Sequence Generation☆119Updated 7 months ago
- MiroRL is an MCP-first reinforcement learning framework for deep research agent.☆183Updated 3 months ago
- Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning☆190Updated 9 months ago
- A unified suite for generating elite reasoning problems and training high-performance LLMs, including pioneering attention-free architect…☆131Updated last month
- ☆91Updated 2 weeks ago
- ☆107Updated 3 months ago
- ☆65Updated last year
- Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization☆80Updated 2 months ago
- The official repo for "AceCoder: Acing Coder RL via Automated Test-Case Synthesis" [ACL25]☆95Updated 8 months ago
- ☆213Updated 10 months ago
- [ICML 2025] Teaching Language Models to Critique via Reinforcement Learning☆118Updated 7 months ago
- Pre-trained, Scalable, High-performance Reward Models via Policy Discriminative Learning.☆163Updated 2 months ago
- A highly capable 2.4B lightweight LLM using only 1T pre-training data with all details.☆222Updated 4 months ago
- MiroMind-M1 is a fully open-source series of reasoning language models built on Qwen-2.5, focused on advancing mathematical reasoning.☆245Updated 4 months ago
- [NeurIPS 2025] The official repo of SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond☆188Updated 5 months ago
- ☆70Updated last year
- Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models☆137Updated last year
- [ACL 2025] We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLM…☆68Updated last year
- ☆124Updated 6 months ago
- [AAAI 2026] Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".☆92Updated last month
- RL Scaling and Test-Time Scaling (ICML'25)☆112Updated 10 months ago
- ☆112Updated 3 months ago
- The official repository of the Omni-MATH benchmark.☆88Updated 11 months ago
- A Comprehensive Survey on Long Context Language Modeling☆215Updated 3 weeks ago
- Towards a Unified View of Large Language Model Post-Training