waylandzhang/DeepSeek-RL-Qwen-0.5B-GRPO-gsm8k

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/waylandzhang/DeepSeek-RL-Qwen-0.5B-GRPO-gsm8k)

waylandzhang / DeepSeek-RL-Qwen-0.5B-GRPO-gsm8k

☆85

Alternatives and similar repositories for DeepSeek-RL-Qwen-0.5B-GRPO-gsm8k

Users that are interested in DeepSeek-RL-Qwen-0.5B-GRPO-gsm8k are comparing it to the libraries listed below

Sorting:

waylandzhang / alphago_zero_from_scratch
View on GitHub
零实现 AlphaGo Zero
☆17Nov 10, 2024Updated last year
waylandzhang / embedding_from_scratch
View on GitHub
训练自己的中文 Embedding 模型
☆28Jan 6, 2025Updated last year
826568389 / GRPO-R1
View on GitHub
☆13Mar 16, 2025Updated 11 months ago
chaoql / CCF-AIOps-Code
View on GitHub
2024CCF国际AIOps挑战赛-赛道二（GLM4）：基于检索增强的运维知识问答挑战赛解决方案分享。
☆14Jul 5, 2024Updated last year
waylandzhang / train_tokenizer
View on GitHub
A demonstration of how to train a custom tokenizer similar to TikToken.
☆15Jan 6, 2025Updated last year
Plumess / MimirFW
View on GitHub
（制作中）本项目旨在开发一个基于大语言模型（LLM）的对话游戏搭建框架，支持类似DND（龙与地下城）、狼人杀、文游等对话类游戏的快速设计和智能化NPC构建，增强对话类游戏的大模型响应体验。
☆17Dec 2, 2025Updated 3 months ago
shareAI-lab / alignment-handbook-cn
View on GitHub
中文版hf-alignment-handbook，大模型全套sft、dpo、orpo、cpt训练教程.
☆14Aug 25, 2024Updated last year
tianchiguaixia / ocr_recognition
View on GitHub
微调阿里开源的文字检测模型，利用合合识别返回的OCR结果作为初始训练数据，对模型进行优化训练，使其更加适应1万张图片的具体场景，提高文字识别的精度。
☆10Dec 9, 2024Updated last year
Miaque / langmanus
View on GitHub
☆12Mar 28, 2025Updated 11 months ago
tianchiguaixia / qwen1.5-ner
View on GitHub
使用Qwen1.5-0.5B-Chat模型进行通用信息抽取任务的微调，旨在：验证生成式方法相较于抽取式NER的效果；为新手提供简易的模型微调流程，尽量减少代码量；大模型训练的数据格式处理。
☆15Sep 6, 2024Updated last year
Ginjing-Yuan / QWen2-from_ground_up
View on GitHub
☆22Jul 15, 2024Updated last year
kossisoroyce / train_grpo.py
View on GitHub
GRPO Training Script for Qwen Model on GSM8K Dataset. This script trains a Qwen model using the GRPO (Generalized Reinforcement Policy Op…
☆28Dec 11, 2025Updated 2 months ago
ztxz16 / exvllm
View on GitHub
vllm混合推理扩展插件，支持多NUMA混合推理，单卡推理Qwen3-Next模型可达1000+ prefill
☆31Nov 7, 2025Updated 4 months ago
rkuo2000 / GenAI
View on GitHub
☆11Updated this week
ZejunCao / bilibili_code
View on GitHub
bilibili视频讲解所使用的课件代码记录
☆26Mar 3, 2026Updated last week
Hemanthkumar2112 / Reward-Modeling-RLHF-Finetune-and-RAG
View on GitHub
Gemma2(9B), Llama3-8B-Finetune-and-RAG, code base for sample, implemented in Kaggle platform
☆22Feb 8, 2025Updated last year
Jisencc / yolov7-keypoint-customization
View on GitHub
Revision of official yolov7-pose to support custom dataset for keypoint detection
☆11Nov 12, 2023Updated 2 years ago
gigio1023 / LLMCompiler-Pro
View on GitHub
An extended project of the LLM Compiler paper, focusing on developing LLM-based Autonomous Agents.
☆26Oct 22, 2024Updated last year
aws-samples / training-llm-on-sagemaker-for-multiple-nodes-with-deepspeed
View on GitHub
☆26Mar 21, 2024Updated last year
tejasmagia / DetectCarParkingSlot_Contest
View on GitHub
Detecting car parking slot on Open car park space
☆13Oct 21, 2019Updated 6 years ago
zRzRzRzRzRzRzR / lm-fly
View on GitHub
大模型推理框架加速，让 LLM 飞起来
☆24May 10, 2024Updated last year
waylandzhang / learn-reinforcement-learning
View on GitHub
《Reinforcement Learning》读书学习与视频分享笔记
☆78Apr 1, 2025Updated 11 months ago
LivingFutureLab / ChineseSimpleQA
View on GitHub
☆76Jan 24, 2025Updated last year
dhcode-cpp / grpo-loss
View on GitHub
☆42Mar 6, 2025Updated last year
GenerativeAgents / dify-book
View on GitHub
Difyで作る生成AIアプリ完全入門
☆17May 25, 2025Updated 9 months ago
barkain / claude-code-workflow-orchestration
View on GitHub
☆26Feb 28, 2026Updated last week
6zzhh6 / WeChat_Formatting_Tool
View on GitHub
A simple WeChat Official Account layout tool based on Dify
☆17Jun 27, 2025Updated 8 months ago
aws-samples / sample-data-analyst-bi
View on GitHub
A full-stack AI-powered business intelligence tool for non-experts, featuring serverless backend processing and a secure Streamlit fronte…
☆28Feb 13, 2026Updated 3 weeks ago
HugoPalomares / design-intent-for-sdd
View on GitHub
☆28Dec 4, 2025Updated 3 months ago
xxxxZhou / Road-occupancy-operation-and-vehicle-illegal-parking-detection
View on GitHub
Use yolov5 to realize the road occupation operation and vehicle parking violation detection in urban streets, and can independently delin…
☆12Jan 2, 2023Updated 3 years ago
michellebonat / tf_text_classify
View on GitHub
The classic movies redux with machine learning using TensorFlow and Keras.
☆11Feb 12, 2019Updated 7 years ago
majinkai / dify-database-to-knowledge
View on GitHub
Write the database metadata into the dify knowledge
☆12Dec 30, 2025Updated 2 months ago
c00cjz00 / llmservice_ip
View on GitHub
☆11Aug 29, 2025Updated 6 months ago
zhibaishouheilab / HealthiVert-GAN
View on GitHub
HealthiVert-GAN, a novel deep-learning framework designed to generate pseudo-healthy vertebral images. These images simulate the pre-frac…
☆11Nov 3, 2025Updated 4 months ago
yanivle / fast_minbpe
View on GitHub
☆17Feb 6, 2025Updated last year
KenKaiii / b0t
View on GitHub
Workflow automation, but you just describe what you want and it happens.
☆27Nov 22, 2025Updated 3 months ago
wqw547243068 / wqw547243068.github.io
View on GitHub
博客信息
☆42Mar 3, 2026Updated last week
stulogy / vibe-prd
View on GitHub
This is a fork from Ryan Carson's AI Dev Tasks repository, with some code cleanup and refactoring to enable support for PostgreSQL databa…
☆15Sep 8, 2025Updated 6 months ago
chengang95 / UnKD
View on GitHub
☆14Jun 15, 2023Updated 2 years ago