sail-sg / oat-zeroLinks

A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.

☆248

Alternatives and similar repositories for oat-zero

Users that are interested in oat-zero are comparing it to the libraries listed below

Sorting:

GAIR-NLP / LIMR
☆213Updated 9 months ago
eddycmu / demystify-long-cot
☆327Updated 6 months ago
LCLM-Horizon / A-Comprehensive-Survey-For-Long-Context-Language-Modeling
A Comprehensive Survey on Long Context Language Modeling
☆209Updated last week
ZubinGou / math-evaluation-harness
A simple toolkit for benchmarking LLMs on mathematical reasoning tasks. 🧮✨
☆267Updated last year
kanishkg / cognitive-behaviors
☆216Updated 8 months ago
cmu-l3 / l1
L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning
☆257Updated 6 months ago
GAIR-NLP / ToRL
☆316Updated 6 months ago
ganler / code-r1
Reproducing R1 for Code with Reliable Rewards
☆275Updated 7 months ago
YuxiXie / MCTS-DPO
This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.
☆327Updated last year
IAAR-Shanghai / xVerify
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations
☆140Updated 3 weeks ago
SuperGPQA / SuperGPQA
☆173Updated 7 months ago
TIGER-AI-Lab / verl-tool
A version of verl to support diverse tool use
☆714Updated last week
QwenLM / ProcessBench
Official repository for ACL 2025 paper "ProcessBench: Identifying Process Errors in Mathematical Reasoning"
☆179Updated 6 months ago
OFA-Sys / gsm8k-ScRel
Codes and Data for Scaling Relationship on Learning Mathematical Reasoning with Large Language Models
☆267Updated last year
PRIME-RL / ImplicitPRM
Repo of paper "Free Process Rewards without Process Labels"
☆167Updated 8 months ago
InternLM / OREAL
Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning
☆190Updated 8 months ago
ElliottYan / LUFFY
Official Repository of "Learning to Reason under Off-Policy Guidance"
☆380Updated 2 months ago
THUDM / LongAlign
[EMNLP 2024] LongAlign: A Recipe for Long Context Alignment of LLMs
☆256Updated 11 months ago
CMU-AIRe / MRT
Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".
☆116Updated 4 months ago
dvlab-research / Step-DPO
Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"
☆388Updated 10 months ago
zwhe99 / DeepMath
A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning
☆277Updated 2 months ago
princeton-nlp / ProLong
Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"
☆240Updated 2 months ago
tongyx361 / Awesome-LLM4Math
Curation of resources for LLM mathematical reasoning, most of which are screened by @tongyx361 to ensure high quality and accompanied wit…
☆145Updated last year
GAIR-NLP / ProX
[ICML 2025] Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale
☆263Updated 4 months ago
RUCAIBox / Slow_Thinking_with_LLMs
A series of technical report on Slow Thinking with LLM
☆748Updated 3 months ago
ltzheng / SimpleTIR
End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning
☆330Updated 2 months ago
MARIO-Math-Reasoning / Super_MARIO
☆341Updated 6 months ago
liziniu / ReMax
Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement Learning Method for Aligning Large Language Models)
☆198Updated last year
QwenLM / AutoIF
☆315Updated last year
TsinghuaC3I / Unify-Post-Training
Towards a Unified View of Large Language Model Post-Training
☆187Updated 2 months ago