Mohammadjafari80/GSM8K-RLVR

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Mohammadjafari80/GSM8K-RLVR)

Mohammadjafari80 / GSM8K-RLVR

A simplified implementation for experimenting with RLVR on GSM8K, This repository provides a starting point for exploring reasoning.

☆170

Alternatives and similar repositories for GSM8K-RLVR

Users that are interested in GSM8K-RLVR are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

rawsh / mirrorllm
View on GitHub
various experiments for scaling inference time compute with small reasoning models
☆17Jan 16, 2025Updated last year
PrasannS / rlhf-length-biases
View on GitHub
☆27Mar 13, 2024Updated 2 years ago
Cranial-XIX / metric-residual-network
View on GitHub
Official PyTorch Implementation for Metric Residual Networks for Sample Efficient Goal-Conditioned Reinforcement Learning
☆20Jan 11, 2023Updated 3 years ago
yudasong / briee
View on GitHub
Representation Learning in RL
☆13Jun 1, 2022Updated 4 years ago
mukhal / ThinkPRM
View on GitHub
[TMLR] Process Reward Models That Think
☆89Nov 29, 2025Updated 6 months ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
affjljoo3581 / Google-American-Sign-Language-Fingerspelling-Recognition
View on GitHub
🎖️ 5th place solution in the Google American Sign Language Fingerspelling Recognition Competition🎖️
☆16Sep 19, 2023Updated 2 years ago
ds-wook / categorical-tabnet
View on GitHub
🧪categorical tabnet research part🧪
☆13Apr 12, 2024Updated 2 years ago
JackKuo666 / a_numpy_based_implement_cnn
View on GitHub
这是我的博客《不用框架，使用Python搭建基于numpy的卷积神经网络来进行cifar-10分类的深度学习系统》的代码实现。
☆10Jul 1, 2019Updated 6 years ago
ypwang61 / One-Shot-RLVR
View on GitHub
[NeurIPS 2025] Reinforcement Learning for Reasoning in Large Language Models with One Training Example
☆437Mar 11, 2026Updated 3 months ago
affjljoo3581 / deit3-jax
View on GitHub
Jax/Flax implementation of DeiT and DeiT-III (ViT)
☆19Dec 21, 2024Updated last year
zhaolongkzz / DeepMimic_configuration
View on GitHub
Motion imitation with deep reinforcement learning.
☆13Jul 24, 2019Updated 6 years ago
BriansIDP / AudioVisualLLM
View on GitHub
☆19May 19, 2024Updated 2 years ago
chai-research / lmgym
View on GitHub
Code base for internal reward models and PPO training
☆24Oct 1, 2023Updated 2 years ago
apple / ml-np-rasp
View on GitHub
☆23Jan 19, 2024Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
willccbb / trl
View on GitHub
Train transformer language models with reinforcement learning.
☆19Feb 25, 2025Updated last year
zwfightzw / Meta-Critic
View on GitHub
☆11Oct 19, 2020Updated 5 years ago
IDEA-XL / SubgDiff
View on GitHub
The official implementation of NeurIPS2024 paper "SubgDiff: A Subgraph Diffusion Model to Improve Molecular Representation Learning."
☆11May 28, 2025Updated last year
bespokelabsai / verifiers
View on GitHub
Verifiers for LLM Reinforcement Learning
☆80Apr 15, 2025Updated last year
wondergo2017 / sild
View on GitHub
Implementation codes for NeurIPS23 paper "Spectral Invariant Learning for Dynamic Graphs under Distribution Shifts"
☆14Mar 19, 2024Updated 2 years ago
WangLabTHU / DeSP
View on GitHub
DNA-D2S: a systematic error simulation Model for DNA Data Storage channel
☆12Feb 14, 2022Updated 4 years ago
Jack-H-Buckner / UniversalDiffEq.jl
View on GitHub
Universal differential equations for ecologists
☆15Apr 24, 2026Updated last month
junkangwu / Dr_DPO
View on GitHub
[ICLR 2025] Official code of "Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization"
☆19Jun 1, 2024Updated 2 years ago
tachukao / idoc
View on GitHub
Implicit Differentiable Optimal Control (IDOC) with JAX
☆12May 11, 2022Updated 4 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
ren258 / ARENA
View on GitHub
☆15Jul 18, 2025Updated 10 months ago
open-thought / reasoning-gym-eval
View on GitHub
Collection of LLM completions for reasoning-gym task datasets
☆31Jul 4, 2025Updated 11 months ago
ainativehealth / GoodMedicalCoder
View on GitHub
☆12Sep 21, 2024Updated last year
UDEER-AI / LLM2AD
View on GitHub
This is the official repo for Do LLM Modules Generalize? A Study on Motion Generation for Autonomous Driving. CoRL 2025
☆21Oct 20, 2025Updated 7 months ago
Parallel-Reasoning / APR
View on GitHub
[COLM 2025] Code for Paper: Learning Adaptive Parallel Reasoning with Language Models
☆143Dec 17, 2025Updated 5 months ago
zzh-thu-22 / ExtendAttack
View on GitHub
[AAAI 2026] This is the official implementation of the paper "ExtendAttack: Attacking Servers of LRMs via Extending Reasoning".
☆23Mar 18, 2026Updated 2 months ago
CodeCreator / WebOrganizer
View on GitHub
Organize the Web: Constructing Domains Enhances Pre-Training Data Curation
☆81May 2, 2025Updated last year
bonniesjli / DQN_SR
View on GitHub
Count based exploration with the successor representation for Unity ML's Pyramid
☆12Jun 19, 2019Updated 6 years ago
safety-research / impossiblebench
View on GitHub
Official Inspect Implementation for "ImpossibleBench: Measuring LLMs' Propensity of Exploiting Test Cases"
☆40Dec 1, 2025Updated 6 months ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
tedmoskovitz / ConstrainedRL4LMs
View on GitHub
A library for constrained RLHF.
☆13Feb 19, 2024Updated 2 years ago
ssokota / mec
View on GitHub
Code for minimum-entropy coupling.
☆33Jan 6, 2026Updated 5 months ago
hkust-nlp / simpleRL-reason
View on GitHub
Simple RL training for reasoning
☆3,864Dec 23, 2025Updated 5 months ago
Pavankunchala / Reinforcement-learning-with-verifable-rewards-Learnings
View on GitHub
RLVR Testing and Training
☆23Aug 28, 2025Updated 9 months ago
KoyenaPal / future-lens
View on GitHub
Code and Data Repo for the CoNLL Paper -- Future Lens: Anticipating Subsequent Tokens from a Single Hidden State
☆21Oct 24, 2025Updated 7 months ago
ruizhaogit / mep
View on GitHub
Maximum Entropy-Regularized Multi-Goal Reinforcement Learning (ICML 2019)
☆24May 30, 2019Updated 7 years ago
0uO / Dual-learning
View on GitHub
Implementation of Dual Learning NMT & Joint Training on tensorflow
☆12Dec 29, 2018Updated 7 years ago