ganler / code-r1Links
Reproducing R1 for Code with Reliable Rewards
☆218Updated last month
Alternatives and similar repositories for code-r1
Users that are interested in code-r1 are comparing it to the libraries listed below
Sorting:
- A Comprehensive Survey on Long Context Language Modeling☆151Updated 2 weeks ago
- ☆220Updated 3 weeks ago
- A version of verl to support tool use☆251Updated this week
- A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.☆240Updated 2 months ago
- Repo of paper "Free Process Rewards without Process Labels"☆153Updated 3 months ago
- [ICML 2025] Teaching Language Models to Critique via Reinforcement Learning☆98Updated last month
- ☆202Updated 4 months ago
- Trinity-RFT is a general-purpose, flexible and scalable framework designed for reinforcement fine-tuning (RFT) of large language models (…☆122Updated this week
- The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning.☆191Updated this week
- Official repository for ACL 2025 paper "ProcessBench: Identifying Process Errors in Mathematical Reasoning"☆158Updated last month
- Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"☆189Updated 3 months ago
- Async pipelined version of Verl☆100Updated 2 months ago
- 😎 A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond☆250Updated last week
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning☆220Updated last month
- [NeurIPS'24] Official code for *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*☆108Updated 6 months ago
- ☆63Updated 6 months ago
- Official Repository of "Learning to Reason under Off-Policy Guidance"☆240Updated 2 weeks ago
- ☆297Updated 3 weeks ago
- ☆116Updated last month
- "what, how, where, and how well? a survey on test-time scaling in large language models" repository☆44Updated this week
- ☆231Updated last week
- Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning☆186Updated 3 months ago
- 🚀 SWE-bench Goes Live!☆65Updated last week
- a survey of long-context LLMs from four perspectives, architecture, infrastructure, training, and evaluation☆50Updated 2 months ago
- ☆233Updated last month
- ☆69Updated 7 months ago
- This is a repo for showcasing using MCTS with LLMs to solve gsm8k problems☆84Updated 3 months ago
- slime is a LLM post-training framework aiming at scaling RL.☆328Updated this week
- ☆104Updated 2 weeks ago
- A simple toolkit for benchmarking LLMs on mathematical reasoning tasks. 🧮✨☆226Updated last year