kossisoroyce / train_grpo.py
View external linksLinks

GRPO Training Script for Qwen Model on GSM8K Dataset. This script trains a Qwen model using the GRPO (Generalized Reinforcement Policy Optimization) method on the GSM8K (Generalized Math 8K) dataset. The script leverages transformers, PEFT (Parameter-Efficient Fine-Tuning), and TRL (Transformers Reinforcement Learning) libraries.
28Dec 11, 2025Updated 2 months ago

Alternatives and similar repositories for train_grpo.py

Users that are interested in train_grpo.py are comparing it to the libraries listed below

Sorting:

Are these results useful?