kossisoroyce / train_grpo.pyView external linksLinks
GRPO Training Script for Qwen Model on GSM8K Dataset. This script trains a Qwen model using the GRPO (Generalized Reinforcement Policy Optimization) method on the GSM8K (Generalized Math 8K) dataset. The script leverages transformers, PEFT (Parameter-Efficient Fine-Tuning), and TRL (Transformers Reinforcement Learning) libraries.
☆28Dec 11, 2025Updated 2 months ago
Alternatives and similar repositories for train_grpo.py
Users that are interested in train_grpo.py are comparing it to the libraries listed below
Sorting:
- Repository of IPBench☆19Jan 4, 2026Updated last month
- ☆14Mar 7, 2025Updated 11 months ago
- Bayesian structure learning and classification in decomposable graphical models.☆11Jan 22, 2024Updated 2 years ago
- 汇编语言学习的例子☆10Aug 5, 2021Updated 4 years ago
- ☆20Aug 8, 2025Updated 6 months ago
- Mutual information estimators and benchmarks☆14Feb 6, 2026Updated last week
- Code for Findings of ACL 2021 paper "Addressing Inquiries about History: An Efficient and Practical Framework for Evaluating Open-domain …☆19Dec 16, 2022Updated 3 years ago
- A helloworld project for latent diffusion models using huggingface diffusers☆15Sep 10, 2024Updated last year
- A marvelous toolbox for DL research.☆14May 2, 2025Updated 9 months ago
- DifferentialEquations.jl with PyTorch☆11Oct 12, 2022Updated 3 years ago
- A tutorial on learned non-adversarial invariance in neural networks☆13Dec 8, 2019Updated 6 years ago
- [ACL 2024 Findings] Light-PEFT: Lightening Parameter-Efficient Fine-Tuning via Early Pruning☆13Sep 2, 2024Updated last year
- Tight Mutual Information Estimation With Contrastive Fenchel-Legendre Optimization☆11Nov 29, 2022Updated 3 years ago
- 计算机视觉 北京邮电大学 鲁鹏 课件与学习笔记☆11Aug 3, 2021Updated 4 years ago
- pytorch版基于gpt+nezha的中文多轮Cdial☆12Oct 22, 2022Updated 3 years ago
- how to create models using Gurobi in Python☆14Mar 25, 2022Updated 3 years ago
- Official implementation of MINDE: Mutual Information Neural Diffusion Estimation☆22Apr 17, 2025Updated 9 months ago
- Structure From Motion in 50 lines using OpenCV☆12May 31, 2021Updated 4 years ago
- 用预训练BERT实现序列标注模型。☆14Sep 29, 2020Updated 5 years ago
- ☆21May 3, 2025Updated 9 months ago
- ☆14May 4, 2024Updated last year
- Official repository for ToolScope: An Agentic Framework for Vision-Guided and Long-Horizon Tool Use☆28Nov 4, 2025Updated 3 months ago
- 从零构建了Agent中最重要的功能-function call☆17Oct 16, 2024Updated last year
- [NeurIPS 2024] "Collaboration! Towards Robust Neural Methods for Routing Problems"☆21Nov 16, 2024Updated last year
- ☆36Jun 25, 2025Updated 7 months ago
- Official pytorch implement of paper InfoNet: Neural Estimation of Mutual Information without Test-Time Optimization☆21Jul 10, 2024Updated last year
- Python code parsing data from PhysioNet Challenge 2012☆22Oct 24, 2018Updated 7 years ago
- Benchmarking MIAs against LLMs.☆28Oct 8, 2024Updated last year
- Implementation of self-certainty as an extention of ZeroEval Project☆34May 31, 2025Updated 8 months ago
- ☆21Jun 16, 2020Updated 5 years ago
- NeurIPS'22 Oral: EquiVSet - Learning Neural Set Functions Under the Optimal Subset Oracle☆21Dec 23, 2022Updated 3 years ago
- Codes for ICLR 21 paper: Neural Approximate Sufficient Statistics for Implicit Models☆20Jun 23, 2022Updated 3 years ago
- ☆23Oct 17, 2024Updated last year
- Implementation of a PyTorch Mutual Information Estimation Toolkit☆23Apr 5, 2024Updated last year
- Implicit Deep Adaptive Design (iDAD): Policy-Based Experimental Design without Likelihoods☆22Dec 30, 2021Updated 4 years ago
- ☆23Feb 8, 2024Updated 2 years ago
- OptiBench and ReSocratic Synthesis Method☆30Oct 2, 2025Updated 4 months ago
- ☆25Dec 23, 2019Updated 6 years ago
- ☆21Mar 17, 2025Updated 10 months ago