liuchen6667/qwen_grpo_gsm8k

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/liuchen6667/qwen_grpo_gsm8k)

liuchen6667 / qwen_grpo_gsm8k

简单易理解的代码，用于在qwen上使用grpo加强数学能力

☆58

Alternatives and similar repositories for qwen_grpo_gsm8k

Users that are interested in qwen_grpo_gsm8k are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

liuchen6667 / qwen2.5_sft_kd
View on GitHub
对qwen2.5进行微调以及知识蒸馏
☆17Dec 24, 2024Updated last year
xiaomi-research / guievalkit
View on GitHub
[ICML 2026] GUIEvalKit: Open-source Evaluation Toolkit for GUI Agents
☆23Feb 26, 2026Updated 4 months ago
jiaolifengmi / VQ-Prompt
View on GitHub
Official PyTorch code for "Vector Quantization Prompting for Continual Learning (NeurIPS2024)".
☆11Oct 16, 2024Updated last year
nfsrules / qwen2.5VL-R1
View on GitHub
QWEN 2.5VL-R1: Multimodal reasoning model for action recognition in videos (Experimental GRPO with LoRA support)
☆25Oct 9, 2025Updated 9 months ago
astordu / agent_from_scratch
View on GitHub
从零构建了Agent中最重要的功能-function call
☆18Oct 16, 2024Updated last year
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
ZhAnGToNG1 / MSFC-Net
View on GitHub
Multi-Scale Semantic Fusion-Guided Fractal Convolutional Object Detection Network for Optical Remote Sensing Imagery
☆12Jul 17, 2022Updated 4 years ago
msetzu / glocalx
View on GitHub
Generating global explanations from local ones
☆11Nov 11, 2022Updated 3 years ago
IDEA-XL / SubgDiff
View on GitHub
The official implementation of NeurIPS2024 paper "SubgDiff: A Subgraph Diffusion Model to Improve Molecular Representation Learning."
☆11May 28, 2025Updated last year
WangLabTHU / DeSP
View on GitHub
DNA-D2S: a systematic error simulation Model for DNA Data Storage channel
☆12Feb 14, 2022Updated 4 years ago
Louise-LuLin / GCL-SPAN
View on GitHub
Code for the paper "Spectrum Guided Topology Augmentation for Graph Contrastive Learning"
☆11Jul 18, 2023Updated 3 years ago
shengtaovvv / Dialogue
View on GitHub
本项目由三个模块构成。意图识别：判断用户的意图是业务型还是闲聊型；模型检索：该部分构建一个语料库，当用户发起新的query（通过意图识别判断为业务型对话）时，为用户匹配query检索的最佳response，使用HSWN进行召回（粗排），然后构建句子的相似度，并利用Lig…
☆12Feb 18, 2021Updated 5 years ago
k2-fsa / sherpa-mlx
View on GitHub
sherpa with mlx
☆15Aug 2, 2025Updated 11 months ago
facebookresearch / ToolVerifier
View on GitHub
This repository contains the ToolSelect dataset which was used to fine-tune Llama-2 70B for tool selection.
☆23Mar 11, 2024Updated 2 years ago
echo840 / LIRA
View on GitHub
[ICCV 2025] LIRA
☆22Nov 25, 2025Updated 7 months ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
liyongqi67 / GRACE
View on GitHub
☆29Aug 25, 2024Updated last year
mengcaopku / SpatialDreamer
View on GitHub
SpatialDreamer: Incentivizing Spatial Reasoning via Active Mental Imagery
☆15Feb 1, 2026Updated 5 months ago
webvenky / simple-airplane
View on GitHub
A simple ROS-Gazebo package provides a quick headstart for testing high level path planning / visual servoing algorithms on multiple fixe…
☆12Feb 28, 2020Updated 6 years ago
xxyQwQ / CoMAS
View on GitHub
Implementation for the paper "CoMAS: Co-Evolving Multi-Agent Systems via Interaction Rewards".
☆52Jan 26, 2026Updated 5 months ago
NJUxlj / Chinese-MedQA-Qwen2
View on GitHub
基于Qwen2+SFT+DPO的医疗问答系统，项目中使用了自定义的 SFTTrainer/DPOTrainer/TRPOTrainer用于训练，其次，项目还调用各种知识库工具（neo4j, milvus, LDA, 等）进行自动化训练数据生成。另外，使用 vllm 用于推理…
☆89Apr 29, 2026Updated 2 months ago
xian-sh / MOL-Mamba
View on GitHub
☆19Sep 4, 2025Updated 10 months ago
fchest / Speech-Transformer-multi-GPUs
View on GitHub
A PyTorch implementation of Speech Transformer with multi-GPUs, an End-to-End ASR with Transformer network on Mandarin Chinese. This code…
☆10Dec 25, 2019Updated 6 years ago
IS2AI / MultilingualASR
View on GitHub
☆14Aug 9, 2021Updated 4 years ago
ZhentingWang / DUMP
View on GitHub
☆33May 9, 2025Updated last year
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
yuleiqin / RAIF
View on GitHub
A Recipe for Building LLM Reasoners to Solve Complex Instructions
☆32Oct 9, 2025Updated 9 months ago
Gelelmaster / Funasr-Qwen-GPTSovits
View on GitHub
<综合> Funasr语音识别，调用Qwen大模型回答，通过GPTSovits输出语音的ai程序，其中调用模型还是在线，后续将添加离线大模型
☆13Nov 30, 2024Updated last year
Yui010206 / Adaptive-Visual-Imagination-Control
View on GitHub
When and How Much to Imagine: Adaptive Test-Time Scaling with World Models for Visual Spatial Reasoning
☆18Jun 2, 2026Updated last month
hoi4d / PPTr
View on GitHub
☆11Aug 5, 2022Updated 3 years ago
thu-coai / Implicit-Toxicity
View on GitHub
Official Code for EMNLP 2023 paper: "Unveiling the Implicit Toxicity in Large Language Models""
☆15Nov 30, 2023Updated 2 years ago
jacky121298 / WLST
View on GitHub
[ICRA 2024] WLST: Weak Labels Guided Self-training for Weakly-supervised Domain Adaptation on 3D Object Detection
☆12Feb 6, 2024Updated 2 years ago
postmalloc / tinysfm
View on GitHub
Structure From Motion in 50 lines using OpenCV
☆13May 31, 2021Updated 5 years ago
NTIA / alignnet
View on GitHub
Train no-reference speech quality estimators with multiple datasets via learned, per-dataset alignments.
☆18Aug 1, 2025Updated 11 months ago
kaihuhuang / Language-Group
View on GitHub
☆11Dec 24, 2024Updated last year
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
miaoyuchun / InfoRM
View on GitHub
The official implementation of InfoRM [NeurIPS 2024].
☆16Oct 25, 2025Updated 8 months ago
owenliang / qwen-dpo
View on GitHub
通义千问的DPO训练
☆66Sep 21, 2024Updated last year
Miamoto / Conformer-NTM
View on GitHub
☆16Nov 9, 2023Updated 2 years ago
liunian-Jay / AgenticRAG-RL
View on GitHub
A minimal implementation of Agentic RAG using GRPO
☆17Jun 11, 2025Updated last year
zal0302 / PNLS
View on GitHub
The official MATLAB implementation of IEEE Transactions on Multimedia 2020 paper "Pixel-level Non-local Image Smoothing with Objective E…
☆19Nov 22, 2020Updated 5 years ago
KouweiLee / BUAA-2022-SysYCompiler
View on GitHub
2022秋季学期-北航计院-编译原理实验课设
☆12Jun 25, 2023Updated 3 years ago
zjunlp / KnowRL
View on GitHub
KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality
☆48May 19, 2026Updated 2 months ago