qiufengqijun/open-r1-reprod

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/qiufengqijun/open-r1-reprod)

qiufengqijun / open-r1-reprod

这是一个open-r1的复现项目，对0.5B、1.5B、3B、7B的qwen模型进行GRPO训练，观察到一些有趣的现象。

☆64

Alternatives and similar repositories for open-r1-reprod

Users that are interested in open-r1-reprod are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

qiufengqijun / mini_qwen
View on GitHub
这是一个从头训练大语言模型的项目，包括预训练、微调和直接偏好优化，模型拥有1B参数，支持中英文。
☆860Feb 18, 2025Updated last year
Dylan9897 / LLM-TextClassification
View on GitHub
集成Qwen与DeepSeek等先进大语言模型，支持纯LLM+分类层模式及LLM+LoRA+分类层模式，使用transformers模块化设计和训练便于根据需要调整或替换组件。
☆21Sep 1, 2025Updated 10 months ago
pihang / LLM_Learning_ph
View on GitHub
从零预训练LLM、SFT、RLHF、DPO笔记整理+面试问题
☆21Sep 2, 2024Updated last year
THU-LYJ-Lab / InstructMotion
View on GitHub
☆26Nov 26, 2024Updated last year
sheep333c / DIVE
View on GitHub
DIVE: Scaling Diversity in Agentic Task Synthesis for Generalizable Tool Use
☆26Mar 13, 2026Updated 4 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
wyshi / sdp_transformers
View on GitHub
☆12Jan 5, 2023Updated 3 years ago
jncsnlp / FSL-Multimodal-Rumor-Detection
View on GitHub
☆11Feb 23, 2023Updated 3 years ago
a28293971 / ResNet_CRNN_OCR
View on GitHub
This repo is used to train and run OCR model which is based on original CRNN and change it's backbone to the ResNet34.
☆10Jan 15, 2021Updated 5 years ago
Exorust / LLM-movie-recommender
View on GitHub
Finetuned Mistral that suggests Movies!
☆11Jan 4, 2024Updated 2 years ago
ShwStone / TRex-PPO
View on GitHub
Run TRex with PPO
☆38May 17, 2025Updated last year
TangTao-PKU / DGTR
View on GitHub
[IROS 2024 Oral Pitch] PyTorch Implementation of "Dual-Branch Graph Transformer Network for 3D Human Mesh Reconstruction from Video"
☆15Jul 19, 2024Updated 2 years ago
makotu1208 / Otto-kaggle-solution-makotupart
View on GitHub
kaggle:otto competition
☆24Feb 13, 2023Updated 3 years ago
ARiSE-Lab / CYCLE_OOPSLA_24
View on GitHub
Open-source repository for the OOPSLA'24 paper "CYCLE: Learning to Self-Refine Code Generation"
☆10Mar 8, 2024Updated 2 years ago
iscyy / External-Attention-pytorch
View on GitHub
🍀 Pytorch implementation of various Attention Mechanisms, MLP, Re-parameter, Convolution, which is helpful to further understand papers.…
☆17Feb 12, 2023Updated 3 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
danaesavi / ImageChain
View on GitHub
This repository is associated with the research paper titled ImageChain: Advancing Sequential Image-to-Text Reasoning in Multimodal Large…
☆15Jun 4, 2025Updated last year
rllab-snu / Spectral-Risk-Constrained-RL
View on GitHub
Official Github Repository for "Spectral-Risk Safe Reinforcement Learning with Convergence Guarantees". (NeurIPS 2024)
☆11Nov 30, 2025Updated 7 months ago
jzhang38 / LPD
View on GitHub
code for EMNLP 2022 paper Better Few-Shot Relation Extraction with Label Prompt Dropout
☆26Nov 8, 2024Updated last year
Timing04 / Awesome-LLM-Tech-Reports-Notes
View on GitHub
some notes for opensource llm technical reorts
☆20Mar 6, 2026Updated 4 months ago
yigu1008 / Diffusion-RPO
View on GitHub
☆15Mar 30, 2025Updated last year
PLUTO-SCY / Python-Face_Recognition
View on GitHub
清华大学电子系--大一下小学期python大作业--一个很简陋的基于的机器学习的人脸识别系统
☆10Sep 2, 2022Updated 3 years ago
sail-sg / AnytimeReasoner
View on GitHub
Optimizing Anytime Reasoning via Budget Relative Policy Optimization
☆54Jul 15, 2025Updated last year
dhcode-cpp / grpo-loss
View on GitHub
☆44Mar 6, 2025Updated last year
SimonZhan-code / Step-Wise_SafeRL_Pixel
View on GitHub
Code space for L4DC paper "State-wise Safe Reinforcement Learning With Pixel Observations"
☆11Apr 5, 2024Updated 2 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
mahaitongdae / Safety_Index_Synthesis
View on GitHub
Code for L4DC 2022 paper: Joint Synthesis of Safety Certificate and Safe Control Policy Using Constrained Reinforcement Learning.
☆15Jul 31, 2023Updated 2 years ago
CanvaChen / chinese-llama-tokenizer
View on GitHub
目标：构建一个更符合语言学的小而美的 llama 分词器，支持中英日三国语言
☆19Jun 2, 2024Updated 2 years ago
guangzhouzhong / NEWSREC
View on GitHub
2020Tianchi Competition News Recommendation
☆11Jan 26, 2021Updated 5 years ago
sylvain-wei / 24-Game-Reasoning
View on GitHub
超简单复现Deepseek-R1-Zero和Deepseek-R1，以「24点游戏」为例。通过zero-RL、SFT以及SFT+RL，以激发LLM的自主验证反思能力。 About Clean, minimal, accessible reproduction of Dee…
☆35Apr 5, 2025Updated last year
Ji-yutong / Intelligent-Q-A-System-for-Automotive-Knowledge
View on GitHub
基于ChatGLM3-6b的智能对话系统，集成了RAG、知识图谱、Agent、多模态等技术来增强大模型的回复质量。
☆69Aug 12, 2024Updated last year
Jianglin954 / awesome-on-policy-distillation
View on GitHub
A curated list of resources on on-policy distillation
☆25Apr 13, 2026Updated 3 months ago
zhyang2226 / AR-Lopti
View on GitHub
[ICLR 2026] Do Not Let Low-Probability Tokens Over-Dominate in RL for LLMs
☆46May 20, 2025Updated last year
SLIT-AI / WRPO
View on GitHub
[ICLR 2025] Weighted-Reward Preference Optimization for Implicit Model Fusion
☆14Mar 17, 2025Updated last year
jiahe7ay / MINI_LLM
View on GitHub
This is a repository used by individuals to experiment and reproduce the pre-training process of LLM.
☆504May 1, 2025Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
wang8740 / MAP
View on GitHub
Documentation at
☆14Mar 27, 2025Updated last year
RubenBranco / SokobanAlphaGo
View on GitHub
AlphaGo Zero Reinforcement Learning Sokoban Solver
☆11Jun 20, 2018Updated 8 years ago
intelligent-control-lab / Implicit_Safe_Set_Algorithm
View on GitHub
☆15Aug 7, 2025Updated 11 months ago
wlll123456 / study_rlhf
View on GitHub
☆108Jul 24, 2025Updated 11 months ago
Ahmedfir / mBERTa
View on GitHub
CodeBERT based mutation testing tool.
☆13Nov 10, 2025Updated 8 months ago
gaiusyu / Denum
View on GitHub
A log compression tool (ASE2024)
☆17Apr 15, 2025Updated last year
iSEngLab / LLM4UT_Empirical
View on GitHub
[ISSTA 2025] A Large-scale Empirical Study on Fine-tuning Large Language Models for Unit Testing
☆13Feb 9, 2025Updated last year