jianzhnie/Open-R1

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/jianzhnie/Open-R1)

jianzhnie / Open-R1

The open source implementation of DeepSeek-R1. 开源复现 DeepSeek-R1

☆275

Alternatives and similar repositories for Open-R1

Users that are interested in Open-R1 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

sunzeyeah / RLHF
View on GitHub
Implementation of Chinese ChatGPT
☆287Nov 20, 2023Updated 2 years ago
BorealisAI / DT-Fixup
View on GitHub
Optimizing Deeper Transformers on Small Datasets https://arxiv.org/abs/2012.15355
☆16Nov 2, 2022Updated 3 years ago
jackfsuia / nanoRLHF
View on GitHub
RLHF experiments on a single A100 40G GPU. Support PPO, GRPO, REINFORCE, RAFT, RLOO, ReMax, DeepSeek R1-Zero reproducing.
☆80Feb 19, 2025Updated last year
jianzhnie / LLamaTuner
View on GitHub
Easy and Efficient Finetuning LLMs. (Supported LLama, LLama2, LLama3, Qwen, Baichuan, GLM , Falcon) 大模型高效量化训练+部署.
☆620Jan 24, 2025Updated last year
flippe3 / chat-ltu
View on GitHub
Open source implementation of InstructGPT (not finished)
☆31Apr 13, 2023Updated 3 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
jasonvanf / llama-trl
View on GitHub
LLaMA-TRL: Fine-tuning LLaMA with PPO and LoRA
☆240Aug 17, 2025Updated 11 months ago
stanleylsx / llms_tool
View on GitHub
一个基于HuggingFace开发的大语言模型训练、测试工具。支持各模型的webui、终端预测，低参数量及全参数模型训练(预训练、SFT、RM、PPO、DPO)和融合、量化。
☆226Dec 8, 2023Updated 2 years ago
zxuu / RLHF
View on GitHub
LLM中相关RLHF算法实现与学习
☆15Apr 13, 2025Updated last year
Edresson / Coqui-TTS
View on GitHub
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
☆37Mar 10, 2022Updated 4 years ago
datawhalechina / unlock-deepseek
View on GitHub
DeepSeek 系列工作解读、扩展和复现。
☆733Mar 9, 2026Updated 4 months ago
SupritYoung / RLHF-Label-Tool
View on GitHub
用于大模型 RLHF 进行人工数据标注排序的工具。A tool for manual response data annotation sorting in RLHF stage.
☆254Aug 1, 2023Updated 2 years ago
Adaxry / get_aligned_BERT_emb
View on GitHub
Get the aligned BERT embedding for sequence labeling tasks
☆18Jun 6, 2019Updated 7 years ago
l294265421 / alpaca-rlhf
View on GitHub
Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback) based on DeepSpeed Chat
☆118Jun 5, 2023Updated 3 years ago
shuyhere / all-about-llm
View on GitHub
大语言模型训练和服务调研
☆37Aug 4, 2023Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
Xianchao-Wu / wekws
View on GitHub
Production First and Production Ready End-to-End Keyword Spotting Toolkit
☆12May 30, 2022Updated 4 years ago
hiyouga / ChatGLM-Efficient-Tuning
View on GitHub
Fine-tuning ChatGLM-6B with PEFT | 基于 PEFT 的高效 ChatGLM 微调
☆3,719Oct 12, 2023Updated 2 years ago
jianzhnie / pyramidbox_pytorch
View on GitHub
pytorch实现的Pyramidbox 人脸检测模型，对原来代码的部分模块进行了修改，更简洁高效
☆22Dec 8, 2020Updated 5 years ago
LianjiaTech / BELLE
View on GitHub
BELLE: Be Everyone's Large Language model Engine（开源中文对话大模型）
☆8,273Oct 16, 2024Updated last year
Knight1112D / CBC_Pi0.7_Openpi
View on GitHub
Unofficial OpenPI extension experiment to build a more complete OpenPI-style VLA engineering stack: pi0.5 semantics, RTC, pi0.6 RECAP/MEM…
☆15Jul 8, 2026Updated last week
ffaltings / InteractiveTextGeneration
View on GitHub
☆34Mar 25, 2023Updated 3 years ago
xrsrke / instructGOOSE
View on GitHub
Implementation of Reinforcement Learning from Human Feedback (RLHF)
☆172Apr 7, 2023Updated 3 years ago
WangRongsheng / MedQA-ChatGLM
View on GitHub
🛰️ 基于真实医疗对话数据在ChatGLM上进行LoRA、P-Tuning V2、Freeze、RLHF等微调，我们的眼光不止于医疗问答
☆339Sep 2, 2023Updated 2 years ago
rania-hossam / LLAMA_FROM_SCRATCH_PYTORCH
View on GitHub
☆16Oct 24, 2023Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
vyomakesh09 / longagent
View on GitHub
LONGAGENT: Scaling Language Models to 128k Context through Multi-Agent Collaboration
☆11Mar 11, 2024Updated 2 years ago
voidful / TextRL
View on GitHub
Implementation of ChatGPT RLHF (Reinforcement Learning with Human Feedback) on any generation model in huggingface's transformer (blommz-…
☆564Apr 23, 2026Updated 2 months ago
Haotianz94 / smpl_visualizer
View on GitHub
☆13Sep 20, 2023Updated 2 years ago
PKU-Alignment / safe-rlhf
View on GitHub
Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
☆1,611Nov 24, 2025Updated 7 months ago
sylvain-wei / 24-Game-Reasoning
View on GitHub
超简单复现Deepseek-R1-Zero和Deepseek-R1，以「24点游戏」为例。通过zero-RL、SFT以及SFT+RL，以激发LLM的自主验证反思能力。 About Clean, minimal, accessible reproduction of Dee…
☆35Apr 5, 2025Updated last year
deepspeedai / DeepSpeedExamples
View on GitHub
Example models using DeepSpeed
☆6,832Updated this week
zhanghaok / BERT-CRF-NER
View on GitHub
基于BERT-CRF的命名实体识别模型
☆13Mar 14, 2022Updated 4 years ago
OpenLMLab / MOSS-RLHF
View on GitHub
Secrets of RLHF in Large Language Models Part I: PPO
☆1,427Mar 3, 2024Updated 2 years ago
Facico / Chinese-Vicuna
View on GitHub
Chinese-Vicuna: A Chinese Instruction-following LLaMA-based Model —— 一个中文低资源的llama+lora方案，结构参考alpaca
☆4,119Apr 18, 2025Updated last year
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
shibing624 / lmft
View on GitHub
ChatGLM-6B fine-tuning.
☆135Apr 25, 2023Updated 3 years ago
conceptofmind / LaMDA-rlhf-pytorch
View on GitHub
Open-source pre-training implementation of Google's LaMDA in PyTorch. Adding RLHF similar to ChatGPT.
☆467Feb 24, 2024Updated 2 years ago
miranthajayatilake / nanoQA2
View on GitHub
ChatGPT on your own data
☆22Apr 18, 2023Updated 3 years ago
alibaba-damo-academy / VL-Cogito
View on GitHub
☆24Nov 4, 2025Updated 8 months ago
jackaduma / ChatGLM-LoRA-RLHF-PyTorch
View on GitHub
A full pipeline to finetune ChatGLM LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Huma…
☆138Apr 28, 2023Updated 3 years ago
Miraclemarvel55 / ChatGLM-RLHF
View on GitHub
对ChatGLM直接使用RLHF提升或降低目标输出概率|Modify ChatGLM output with only RLHF
☆196May 23, 2023Updated 3 years ago
ZhenZHAO / DPMS
View on GitHub
Rethinking Data Perturbation and Model Stabilization for Semi-supervised Medical Image Segmentation
☆15Jul 3, 2026Updated 2 weeks ago