A simplified implementation for experimenting with RLVR on GSM8K, This repository provides a starting point for exploring reasoning.
☆170Feb 6, 2025Updated last year
Alternatives and similar repositories for GSM8K-RLVR
Users that are interested in GSM8K-RLVR are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- various experiments for scaling inference time compute with small reasoning models☆17Jan 16, 2025Updated last year
- ☆27Mar 13, 2024Updated 2 years ago
- Official PyTorch Implementation for Metric Residual Networks for Sample Efficient Goal-Conditioned Reinforcement Learning☆20Jan 11, 2023Updated 3 years ago
- Representation Learning in RL☆13Jun 1, 2022Updated 4 years ago
- [TMLR] Process Reward Models That Think☆89Nov 29, 2025Updated 6 months ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- 🎖️ 5th place solution in the Google American Sign Language Fingerspelling Recognition Competition🎖️☆16Sep 19, 2023Updated 2 years ago
- 🧪categorical tabnet research part🧪☆13Apr 12, 2024Updated 2 years ago
- 这是我的博客《不用框架,使用Python搭建基于numpy的卷积神经网络来进行cifar-10分类的深度学习系统》的代码实现。☆10Jul 1, 2019Updated 6 years ago
- [NeurIPS 2025] Reinforcement Learning for Reasoning in Large Language Models with One Training Example☆437Mar 11, 2026Updated 3 months ago
- Jax/Flax implementation of DeiT and DeiT-III (ViT)☆19Dec 21, 2024Updated last year
- Motion imitation with deep reinforcement learning.☆13Jul 24, 2019Updated 6 years ago
- ☆19May 19, 2024Updated 2 years ago
- Code base for internal reward models and PPO training☆24Oct 1, 2023Updated 2 years ago
- ☆23Jan 19, 2024Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Train transformer language models with reinforcement learning.☆19Feb 25, 2025Updated last year
- ☆11Oct 19, 2020Updated 5 years ago
- The official implementation of NeurIPS2024 paper "SubgDiff: A Subgraph Diffusion Model to Improve Molecular Representation Learning."☆11May 28, 2025Updated last year
- Verifiers for LLM Reinforcement Learning☆80Apr 15, 2025Updated last year
- Implementation codes for NeurIPS23 paper "Spectral Invariant Learning for Dynamic Graphs under Distribution Shifts"☆14Mar 19, 2024Updated 2 years ago
- DNA-D2S: a systematic error simulation Model for DNA Data Storage channel☆12Feb 14, 2022Updated 4 years ago
- Universal differential equations for ecologists☆15Apr 24, 2026Updated last month
- [ICLR 2025] Official code of "Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization"☆19Jun 1, 2024Updated 2 years ago
- Implicit Differentiable Optimal Control (IDOC) with JAX☆12May 11, 2022Updated 4 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- ☆15Jul 18, 2025Updated 10 months ago
- Collection of LLM completions for reasoning-gym task datasets☆31Jul 4, 2025Updated 11 months ago
- ☆12Sep 21, 2024Updated last year
- This is the official repo for Do LLM Modules Generalize? A Study on Motion Generation for Autonomous Driving. CoRL 2025☆21Oct 20, 2025Updated 7 months ago
- [COLM 2025] Code for Paper: Learning Adaptive Parallel Reasoning with Language Models☆143Dec 17, 2025Updated 5 months ago
- [AAAI 2026] This is the official implementation of the paper "ExtendAttack: Attacking Servers of LRMs via Extending Reasoning".☆23Mar 18, 2026Updated 2 months ago
- Organize the Web: Constructing Domains Enhances Pre-Training Data Curation☆81May 2, 2025Updated last year
- Count based exploration with the successor representation for Unity ML's Pyramid☆12Jun 19, 2019Updated 6 years ago
- Official Inspect Implementation for "ImpossibleBench: Measuring LLMs' Propensity of Exploiting Test Cases"☆40Dec 1, 2025Updated 6 months ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- A library for constrained RLHF.☆13Feb 19, 2024Updated 2 years ago
- Code for minimum-entropy coupling.☆33Jan 6, 2026Updated 5 months ago
- Simple RL training for reasoning☆3,864Dec 23, 2025Updated 5 months ago
- RLVR Testing and Training☆23Aug 28, 2025Updated 9 months ago
- Code and Data Repo for the CoNLL Paper -- Future Lens: Anticipating Subsequent Tokens from a Single Hidden State☆21Oct 24, 2025Updated 7 months ago
- Maximum Entropy-Regularized Multi-Goal Reinforcement Learning (ICML 2019)☆24May 30, 2019Updated 7 years ago
- Implementation of Dual Learning NMT & Joint Training on tensorflow☆12Dec 29, 2018Updated 7 years ago