phonism / CP-ZeroLinks
Based on the R1-Zero method, using rule-based rewards and GRPO on the Code Contests dataset.
β18Updated 9 months ago
Alternatives and similar repositories for CP-Zero
Users that are interested in CP-Zero are comparing it to the libraries listed below
Sorting:
- Reproducing R1 for Code with Reliable Rewardsβ286Updated 9 months ago
- [NeurIPS'24] Official code for *π―DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*β120Updated last year
- Revisiting Mid-training in the Era of Reinforcement Learning Scalingβ182Updated 6 months ago
- xVerify: Efficient Answer Verifier for Reasoning Model Evaluationsβ143Updated 2 months ago
- Async pipelined version of Verlβ124Updated 10 months ago
- The official repo for "AceCoder: Acing Coder RL via Automated Test-Case Synthesis" [ACL25]β96Updated 10 months ago
- This is the official implementation for paper "PENCIL: Long Thoughts with Short Memory".β73Updated 9 months ago
- CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratingsβ65Updated last year
- [NeurIPS 2024] Fast Best-of-N Decoding via Speculative Rejectionβ55Updated last year
- Resources for the Enigmata Project.β77Updated 5 months ago
- β80Updated 10 months ago
- Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"β246Updated 4 months ago
- β68Updated last year
- β50Updated 5 months ago
- [NeurIPS 2025 D&B] π SWE-bench Goes Live!β161Updated last week
- A Sober Look at Language Model Reasoningβ92Updated 2 months ago
- β78Updated last year
- [ICML 2025] Teaching Language Models to Critique via Reinforcement Learningβ120Updated 9 months ago
- The official repository of the Omni-MATH benchmark.β93Updated last year
- β215Updated 11 months ago
- Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]β147Updated last year
- SWE-Swiss: A Multi-Task Fine-Tuning and RL Recipe for High-Performance Issue Resolutionβ104Updated 4 months ago
- β87Updated 5 months ago
- [ICML 2025] Predictive Data Selection: The Data That Predicts Is the Data That Teachesβ60Updated 11 months ago
- [NeurIPS 2025 Spotlight] Co-Evolving LLM Coder and Unit Tester via Reinforcement Learningβ149Updated 4 months ago
- β129Updated 8 months ago
- [ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correctionβ87Updated 10 months ago
- Code for "Reasoning to Learn from Latent Thoughts"β124Updated 10 months ago
- β352Updated 6 months ago
- A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.β249Updated 9 months ago