phonism / CP-ZeroLinks

Based on the R1-Zero method, using rule-based rewards and GRPO on the Code Contests dataset.

☆17

Alternatives and similar repositories for CP-Zero

Users that are interested in CP-Zero are comparing it to the libraries listed below

Sorting:

ganler / code-r1
Reproducing R1 for Code with Reliable Rewards
☆239Updated 2 months ago
ltzheng / SimpleTIR
End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning
☆135Updated this week
hkust-nlp / dart-math
[NeurIPS'24] Official code for *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*
☆109Updated 7 months ago
IAAR-Shanghai / xVerify
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations
☆122Updated 3 months ago
TIGER-AI-Lab / verl-tool
A version of verl to support tool use
☆297Updated last week
CMU-AIRe / MRT
Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".
☆101Updated last week
agentica-project / verl-pipeline
Async pipelined version of Verl
☆110Updated 3 months ago
microsoft / SWE-bench-Live
🚀 SWE-bench Goes Live!
☆100Updated last week
GAIR-NLP / OctoThinker
Revisiting Mid-training in the Era of Reinforcement Learning Scaling
☆152Updated this week
GAIR-NLP / LIMR
☆206Updated 5 months ago
KbsdJames / Omni-MATH
The official repository of the Omni-MATH benchmark.
☆85Updated 7 months ago
GAIR-NLP / ToRL
☆246Updated 2 months ago
thu-wyz / inference_scaling
☆71Updated 8 months ago
HKUNLP / critic-rl
[ICML 2025] Teaching Language Models to Critique via Reinforcement Learning
☆105Updated 2 months ago
PRIME-RL / ImplicitPRM
Repo of paper "Free Process Rewards without Process Labels"
☆157Updated 4 months ago
hahahawu / Long-to-Short-via-Model-Merging
Model merging is a highly efficient approach for long-to-short reasoning.
☆76Updated last month
hkust-nlp / llm-compression-intelligence
Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]
☆139Updated 10 months ago
GAIR-NLP / AIME-Preview
☆71Updated 4 months ago
OpenSparseLLMs / Linear-MoE
☆112Updated last month
Zanette-Labs / efficient-reasoning
☆65Updated 3 months ago
QwenLM / CodeElo
CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings
☆45Updated 5 months ago
princeton-nlp / ProLong
Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"
☆215Updated 4 months ago
GAIR-NLP / ReasonEval
[AAAI 2025 oral] Evaluating Mathematical Reasoning Beyond Accuracy
☆63Updated 7 months ago
richardodliu / OpenCodeEval
☆36Updated 2 months ago
SkyworkAI / skywork-o1-prm-inference
☆64Updated 7 months ago
YangLing0818 / SuperCorrect-llm
[ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction
☆74Updated 4 months ago
Zanette-Labs / SpeculativeRejection
[NeurIPS 2024] Fast Best-of-N Decoding via Speculative Rejection
☆48Updated 8 months ago
cmu-l3 / l1
L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning
☆230Updated 2 months ago
yyht / openrlhf_async_pipline
☆61Updated this week
TIGER-AI-Lab / AceCoder
The official repo for "AceCoder: Acing Coder RL via Automated Test-Case Synthesis" [ACL25]
☆88Updated 3 months ago