ganler / code-r1Links
Reproducing R1 for Code with Reliable Rewards
☆201Updated 3 weeks ago
Alternatives and similar repositories for code-r1
Users that are interested in code-r1 are comparing it to the libraries listed below
Sorting:
- A Comprehensive Survey on Long Context Language Modeling☆147Updated 2 weeks ago
- Repo of paper "Free Process Rewards without Process Labels"☆149Updated 2 months ago
- ☆173Updated 2 months ago
- A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.☆239Updated last month
- Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"☆184Updated 2 months ago
- [ICML 2025] Teaching Language Models to Critique via Reinforcement Learning☆98Updated 3 weeks ago
- ☆201Updated 3 months ago
- ☆198Updated last week
- ☆293Updated this week
- Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning☆179Updated 2 months ago
- ☆63Updated 6 months ago
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning☆213Updated 2 weeks ago
- Async pipelined version of Verl☆91Updated last month
- ☆107Updated last week
- Official Repository of "Learning to Reason under Off-Policy Guidance"☆205Updated this week
- ☆64Updated last month
- Official repository for ACL 2025 paper "ProcessBench: Identifying Process Errors in Mathematical Reasoning"☆155Updated 2 weeks ago
- Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".☆94Updated 2 months ago
- [NeurIPS'24] Official code for *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*☆105Updated 5 months ago
- [NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.☆121Updated 2 months ago
- Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling☆102Updated 4 months ago
- xVerify: Efficient Answer Verifier for Reasoning Model Evaluations☆106Updated last month
- Trinity-RFT is a general-purpose, flexible and scalable framework designed for reinforcement fine-tuning (RFT) of large language models (…☆100Updated this week
- "what, how, where, and how well? a survey on test-time scaling in large language models" repository☆41Updated this week
- verl-agent is an extension of veRL, designed for training LLM/VLM agents via RL. verl-agent is also the official code for paper "Group-in…☆232Updated this week
- On Memorization of Large Language Models in Logical Reasoning☆65Updated 2 months ago
- ☆193Updated this week
- A version of verl to support tool use☆41Updated this week
- ☆69Updated 6 months ago
- Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".☆73Updated last month