ganler / code-r1
Reproducing R1 for Code with Reliable Rewards
☆140Updated 3 weeks ago
Alternatives and similar repositories for code-r1:
Users that are interested in code-r1 are comparing it to the libraries listed below
- A Comprehensive Survey on Long Context Language Modeling☆113Updated this week
- Code for Paper: Teaching Language Models to Critique via Reinforcement Learning☆84Updated last month
- A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.☆216Updated this week
- Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"☆170Updated 3 weeks ago
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning☆162Updated last week
- Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".☆78Updated 2 weeks ago
- The official repository of the Omni-MATH benchmark.☆78Updated 3 months ago
- Based on the R1-Zero method, using rule-based rewards and GRPO on the Code Contests dataset.☆17Updated last month
- ☆171Updated last month
- ☆144Updated 3 months ago
- ☆60Updated 4 months ago
- ☆61Updated 4 months ago
- Repo of paper "Free Process Rewards without Process Labels"☆138Updated 2 weeks ago
- Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]☆130Updated 6 months ago
- ☆49Updated last month
- [NeurIPS 2024] Fast Best-of-N Decoding via Speculative Rejection☆40Updated 5 months ago
- A simple toolkit for benchmarking LLMs on mathematical reasoning tasks. 🧮✨☆191Updated 11 months ago
- ☆79Updated last week
- ☆28Updated 4 months ago
- On Memorization of Large Language Models in Logical Reasoning☆59Updated 4 months ago
- ☆129Updated this week
- Ouroboros: Speculative Decoding with Large Model Enhanced Drafting (EMNLP 2024 main)☆90Updated last week
- We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLMs.☆60Updated 5 months ago
- [NeurIPS'24] Official code for *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*☆99Updated 3 months ago
- Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning☆161Updated last week
- TokenSkip: Controllable Chain-of-Thought Compression in LLMs☆103Updated 2 weeks ago
- Implementation for the research paper "Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision".☆52Updated 4 months ago
- ☆72Updated this week
- Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling☆95Updated 2 months ago
- [ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction☆64Updated last week