ganler / code-r1
Reproducing R1 for Code with Reliable Rewards
☆167Updated last week
Alternatives and similar repositories for code-r1:
Users that are interested in code-r1 are comparing it to the libraries listed below
- A Comprehensive Survey on Long Context Language Modeling☆129Updated 3 weeks ago
- Async pipelined version of Verl☆54Updated last week
- ☆62Updated 4 months ago
- Repo of paper "Free Process Rewards without Process Labels"☆141Updated last month
- Code for Paper: Teaching Language Models to Critique via Reinforcement Learning☆90Updated last week
- A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.☆222Updated 2 weeks ago
- ☆184Updated last month
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning☆180Updated last month
- Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"☆175Updated last month
- Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".☆90Updated last month
- ☆107Updated 2 weeks ago
- ☆148Updated 4 months ago
- ☆89Updated 3 weeks ago
- ☆50Updated this week
- On Memorization of Large Language Models in Logical Reasoning☆63Updated 3 weeks ago
- [NeurIPS'24] Official code for *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*☆101Updated 4 months ago
- Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning☆169Updated 3 weeks ago
- ☆62Updated 4 months ago
- Official repository for R2E-Gym: Procedural Environment Generation and Hybrid Verifiers for Scaling Open-Weights SWE Agents☆58Updated last week
- A simple toolkit for benchmarking LLMs on mathematical reasoning tasks. 🧮✨☆199Updated 11 months ago
- A comprehensive collection of process reward models.☆53Updated last week
- Curation of resources for LLM mathematical reasoning, most of which are screened by @tongyx361 to ensure high quality and accompanied wit…☆122Updated 9 months ago
- ☆41Updated last week
- ☆75Updated 3 weeks ago
- ☆278Updated last month
- The official repository of the Omni-MATH benchmark.☆80Updated 3 months ago
- ACL 2024 | LooGLE: Long Context Evaluation for Long-Context Language Models☆181Updated 6 months ago
- Implementation for the research paper "Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision".☆52Updated 4 months ago
- ☆91Updated last month
- A research repo for experiments about Reinforcement Finetuning☆43Updated last week