kossisoroyce / train_grpo.py
View external linksLinks

GRPO Training Script for Qwen Model on GSM8K Dataset. This script trains a Qwen model using the GRPO (Generalized Reinforcement Policy Optimization) method on the GSM8K (Generalized Math 8K) dataset. The script leverages transformers, PEFT (Parameter-Efficient Fine-Tuning), and TRL (Transformers Reinforcement Learning) libraries.

☆28

Alternatives and similar repositories for train_grpo.py

Users that are interested in train_grpo.py are comparing it to the libraries listed below

Sorting:

IPBench / IPBench
View on GitHub
Repository of IPBench
☆19Jan 4, 2026Updated last month
FanZT6 / FairMT-bench
View on GitHub
☆14Mar 7, 2025Updated 11 months ago
felixleopoldo / trilearn
View on GitHub
Bayesian structure learning and classification in decomposable graphical models.
☆11Jan 22, 2024Updated 2 years ago
StevenBaby / learning-assembly
View on GitHub
汇编语言学习的例子
☆10Aug 5, 2021Updated 4 years ago
staymylove / COT_Compresstion_via_Step_entropy
View on GitHub
☆20Aug 8, 2025Updated 6 months ago
VanessB / mutinfo
View on GitHub
Mutual information estimators and benchmarks
☆14Feb 6, 2026Updated last week
ictnlp / AIH
View on GitHub
Code for Findings of ACL 2021 paper "Addressing Inquiries about History: An Efficient and Practical Framework for Evaluating Open-domain …
☆19Dec 16, 2022Updated 3 years ago
nevinbaiju / latent_diffusion-mnist
View on GitHub
A helloworld project for latent diffusion models using huggingface diffusers
☆15Sep 10, 2024Updated last year
xrj-com / marveltoolbox
View on GitHub
A marvelous toolbox for DL research.
☆14May 2, 2025Updated 9 months ago
sbi-benchmark / diffeqtorch
View on GitHub
DifferentialEquations.jl with PyTorch
☆11Oct 12, 2022Updated 3 years ago
dcmoyer / invariance-tutorial
View on GitHub
A tutorial on learned non-adversarial invariance in neural networks
☆13Dec 8, 2019Updated 6 years ago
gccnlp / Light-PEFT
View on GitHub
[ACL 2024 Findings] Light-PEFT: Lightening Parameter-Efficient Fine-Tuning via Early Pruning
☆13Sep 2, 2024Updated last year
qingguo666 / FLO
View on GitHub
Tight Mutual Information Estimation With Contrastive Fenchel-Legendre Optimization
☆11Nov 29, 2022Updated 3 years ago
Hanxun-Yu / BUPT_PengLu_CV
View on GitHub
计算机视觉北京邮电大学鲁鹏课件与学习笔记
☆11Aug 3, 2021Updated 4 years ago
wakafengfan / CDial-GPT-NEZHA
View on GitHub
pytorch版基于gpt+nezha的中文多轮Cdial
☆12Oct 22, 2022Updated 3 years ago
PariseC / modeling_examples_using_gurobi_in_python
View on GitHub
how to create models using Gurobi in Python
☆14Mar 25, 2022Updated 3 years ago
MustaphaBounoua / minde
View on GitHub
Official implementation of MINDE: Mutual Information Neural Diffusion Estimation
☆22Apr 17, 2025Updated 9 months ago
postmalloc / tinysfm
View on GitHub
Structure From Motion in 50 lines using OpenCV
☆12May 31, 2021Updated 4 years ago
yym6472 / bert_slot_tagging
View on GitHub
用预训练BERT实现序列标注模型。
☆14Sep 29, 2020Updated 5 years ago
CIAM-Group / SIL
View on GitHub
☆21May 3, 2025Updated 9 months ago
JingXuTHU / Random-Masking-Finds-Winning-Tickets-for-Parameter-Efficient-Fine-tuning
View on GitHub
☆14May 4, 2024Updated last year
dengmengjie / ToolScope
View on GitHub
Official repository for ToolScope: An Agentic Framework for Vision-Guided and Long-Horizon Tool Use
☆28Nov 4, 2025Updated 3 months ago
astordu / agent_from_scratch
View on GitHub
从零构建了Agent中最重要的功能-function call
☆17Oct 16, 2024Updated last year
RoyalSkye / Routing-CNF
View on GitHub
[NeurIPS 2024] "Collaboration! Towards Robust Neural Methods for Routing Problems"
☆21Nov 16, 2024Updated last year
KexinHUANG19 / InstructTTSEval
View on GitHub
☆36Jun 25, 2025Updated 7 months ago
datou30 / InfoNet
View on GitHub
Official pytorch implement of paper InfoNet: Neural Estimation of Mutual Information without Test-Time Optimization
☆21Jul 10, 2024Updated last year
alistairewj / challenge2012
View on GitHub
Python code parsing data from PhysioNet Challenge 2012
☆22Oct 24, 2018Updated 7 years ago
imperial-aisp / mia_llms_benchmark
View on GitHub
Benchmarking MIAs against LLMs.
☆28Oct 8, 2024Updated last year
backprop07 / Self-Certainty
View on GitHub
Implementation of self-certainty as an extention of ZeroEval Project
☆34May 31, 2025Updated 8 months ago
yaohungt / Pointwise_Dependency_Neural_Estimation
View on GitHub
☆21Jun 16, 2020Updated 5 years ago
SubsetSelection / EquiVSet
View on GitHub
NeurIPS'22 Oral: EquiVSet - Learning Neural Set Functions Under the Optimal Subset Oracle
☆21Dec 23, 2022Updated 3 years ago
cyz-ai / neural-approx-ss-lfi
View on GitHub
Codes for ICLR 21 paper: Neural Approximate Sufficient Statistics for Implicit Models
☆20Jun 23, 2022Updated 3 years ago
pengzhendong / audio-pipeline
View on GitHub
☆23Oct 17, 2024Updated last year
mfederici / torch-mist
View on GitHub
Implementation of a PyTorch Mutual Information Estimation Toolkit
☆23Apr 5, 2024Updated last year
desi-ivanova / idad
View on GitHub
Implicit Deep Adaptive Design (iDAD): Policy-Based Experimental Design without Likelihoods
☆22Dec 30, 2021Updated 4 years ago
LOGO-CUHKSZ / ASP
View on GitHub
☆23Feb 8, 2024Updated 2 years ago
yangzhch6 / ReSocratic
View on GitHub
OptiBench and ReSocratic Synthesis Method
☆30Oct 2, 2025Updated 4 months ago
DeepBrainAI / ERD
View on GitHub
☆25Dec 23, 2019Updated 6 years ago
David-Li0406 / AI-Supervision-Risk
View on GitHub
☆21Mar 17, 2025Updated 10 months ago

kossisoroyce / train_grpo.pyView external linksLinks

Alternatives and similar repositories for train_grpo.py

kossisoroyce / train_grpo.py
View external linksLinks