ganler / code-r1
Reproducing R1 for Code with Reliable Rewards
☆188Updated 2 weeks ago
Alternatives and similar repositories for code-r1:
Users that are interested in code-r1 are comparing it to the libraries listed below
- A Comprehensive Survey on Long Context Language Modeling☆138Updated last month
- ☆192Updated 2 months ago
- ☆153Updated last month
- ☆64Updated 5 months ago
- ☆115Updated 2 weeks ago
- Code for Paper: Teaching Language Models to Critique via Reinforcement Learning☆94Updated 3 weeks ago
- Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning☆175Updated last month
- Repo of paper "Free Process Rewards without Process Labels"☆145Updated last month
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning☆195Updated last month
- Async pipelined version of Verl☆66Updated last month
- A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.☆234Updated 3 weeks ago
- ☆163Updated last month
- ☆59Updated 3 weeks ago
- ☆151Updated 4 months ago
- ☆63Updated 5 months ago
- Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"☆179Updated 2 months ago
- ☆96Updated this week
- [NeurIPS'24] Official code for *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*☆105Updated 4 months ago
- Based on the R1-Zero method, using rule-based rewards and GRPO on the Code Contests dataset.☆17Updated 2 weeks ago
- [ICLR 2025] Benchmarking Agentic Workflow Generation☆89Updated 2 months ago
- A simple toolkit for benchmarking LLMs on mathematical reasoning tasks. 🧮✨☆209Updated last year
- Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".☆94Updated last month
- ☆287Updated last month
- ☆138Updated last week
- 😎 A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond☆205Updated last week
- The related works and background techniques about Openai o1☆221Updated 4 months ago
- The official repo for "AceCoder: Acing Coder RL via Automated Test-Case Synthesis"☆81Updated last month
- ☆80Updated 3 weeks ago
- Implementation for the research paper "Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision".☆52Updated 5 months ago
- Trinity-RFT is a general-purpose, flexible and scalable framework designed for reinforcement fine-tuning (RFT) of large language models (…☆71Updated last week