A simplified implementation for experimenting with RLVR on GSM8K, This repository provides a starting point for exploring reasoning.
☆163Feb 6, 2025Updated last year
Alternatives and similar repositories for GSM8K-RLVR
Users that are interested in GSM8K-RLVR are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- various experiments for scaling inference time compute with small reasoning models☆17Jan 16, 2025Updated last year
- This is the official repository for "CoAnnotating: Uncertainty-Guided Work Allocation between Human and Large Language Models for Data An…☆23Oct 26, 2023Updated 2 years ago
- Official PyTorch Implementation for Metric Residual Networks for Sample Efficient Goal-Conditioned Reinforcement Learning☆20Jan 11, 2023Updated 3 years ago
- [TMLR] Process Reward Models That Think☆84Nov 29, 2025Updated 4 months ago
- Code for the arXiv preprint "Answer, Assemble, Ace: Understanding How Transformers Answer Multiple Choice Questions"☆15Aug 2, 2025Updated 8 months ago
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- 🎖️ 5th place solution in the Google American Sign Language Fingerspelling Recognition Competition🎖️☆16Sep 19, 2023Updated 2 years ago
- 这是我的博客《不用框架,使用Python搭建基于numpy的卷积神经网络来进行cifar-10分类的深度学习系统》的代码实现。☆10Jul 1, 2019Updated 6 years ago
- ☆13Jan 22, 2026Updated 2 months ago
- [NeurIPS 2025] Reinforcement Learning for Reasoning in Large Language Models with One Training Example☆424Mar 11, 2026Updated last month
- Code for the paper Alpha Zero in Continuous Action Space (A0C) (https://arxiv.org/pdf/1805.09613.pdf)☆15Jan 19, 2021Updated 5 years ago
- Jax/Flax implementation of DeiT and DeiT-III (ViT)☆19Dec 21, 2024Updated last year
- Motion imitation with deep reinforcement learning.☆13Jul 24, 2019Updated 6 years ago
- ☆30Nov 5, 2024Updated last year
- ☆19May 19, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Universal differential equations for ecologists☆14Mar 24, 2026Updated 3 weeks ago
- ☆21Aug 30, 2025Updated 7 months ago
- Improving upon state of the art cooperative deep reinforcement learning in StarCraft II☆13May 16, 2019Updated 6 years ago
- Train transformer language models with reinforcement learning.☆19Feb 25, 2025Updated last year
- ☆11Oct 19, 2020Updated 5 years ago
- The official implementation of NeurIPS2024 paper "SubgDiff: A Subgraph Diffusion Model to Improve Molecular Representation Learning."☆11May 28, 2025Updated 10 months ago
- MCP prompt tool applying Chain-of-Draft (CoD) reasoning - BYOLLM☆18Sep 8, 2025Updated 7 months ago
- Verifiers for LLM Reinforcement Learning☆79Apr 15, 2025Updated 11 months ago
- DNA-D2S: a systematic error simulation Model for DNA Data Storage channel☆12Feb 14, 2022Updated 4 years ago
- Deploy open-source AI quickly and easily - Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- TOKEN-IMPORTANCE GUIDED DIRECT PREFERENCE OPTIMIZATION☆29Jan 26, 2026Updated 2 months ago
- This repository is contains several Automated feature selection methods in CTR Predicition.☆10Dec 18, 2022Updated 3 years ago
- ☆13Jul 12, 2024Updated last year
- Code for the paper "Mehta, S. V., Patil, D., Chandar, S., & Strubell, E. (2023). An Empirical Investigation of the Role of Pre-training i…☆17Mar 18, 2024Updated 2 years ago
- Hand Mesh Recovery models on OakInk-Image dataset☆13Apr 4, 2024Updated 2 years ago
- [ICLR 2025] Official code of "Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization"☆18Jun 1, 2024Updated last year
- Implicit Differentiable Optimal Control (IDOC) with JAX☆12May 11, 2022Updated 3 years ago
- FurNet: A Deep-Learning-Based Framework for Removing Furniture Objects in Room Image☆14Nov 22, 2022Updated 3 years ago
- Collection of LLM completions for reasoning-gym task datasets☆31Jul 4, 2025Updated 9 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Organize the Web: Constructing Domains Enhances Pre-Training Data Curation☆80May 2, 2025Updated 11 months ago
- This repo holds the code, dataset, and running scripts for fast k-means evaluation☆15May 20, 2022Updated 3 years ago
- Count based exploration with the successor representation for Unity ML's Pyramid☆12Jun 19, 2019Updated 6 years ago
- [NeurIPS'23] Binary Classification with Confidence Difference☆10May 13, 2024Updated last year
- An adaptive training algorithm for residual network☆17Aug 22, 2020Updated 5 years ago
- ☆25Aug 29, 2025Updated 7 months ago
- Simple RL training for reasoning☆3,846Dec 23, 2025Updated 3 months ago