A simplified implementation for experimenting with RLVR on GSM8K, This repository provides a starting point for exploring reasoning.
☆163Feb 6, 2025Updated last year
Alternatives and similar repositories for GSM8K-RLVR
Users that are interested in GSM8K-RLVR are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- various experiments for scaling inference time compute with small reasoning models☆17Jan 16, 2025Updated last year
- [TMLR] Process Reward Models That Think☆82Nov 29, 2025Updated 3 months ago
- Code for the arXiv preprint "Answer, Assemble, Ace: Understanding How Transformers Answer Multiple Choice Questions"☆15Aug 2, 2025Updated 7 months ago
- 🥇 LG-AI-Challenge 2022 1위 솔루션 입니다.☆13Jun 6, 2023Updated 2 years ago
- [NeurIPS 2025] Reinforcement Learning for Reasoning in Large Language Models with One Training Example☆418Mar 11, 2026Updated last week
- 🥉 Codalab-Microsoft-COCO-Image-Captioning-Challenge 3rd place solution(06.30.21)☆23Apr 6, 2022Updated 3 years ago
- Motion imitation with deep reinforcement learning.☆13Jul 24, 2019Updated 6 years ago
- ☆23Mar 21, 2025Updated last year
- Universal differential equations for ecologists☆14Mar 2, 2026Updated 3 weeks ago
- Code base for internal reward models and PPO training☆24Oct 1, 2023Updated 2 years ago
- ☆21Jan 19, 2024Updated 2 years ago
- Improving upon state of the art cooperative deep reinforcement learning in StarCraft II☆13May 16, 2019Updated 6 years ago
- [ACL 2025] Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems☆125Jun 11, 2025Updated 9 months ago
- MCP prompt tool applying Chain-of-Draft (CoD) reasoning - BYOLLM☆18Sep 8, 2025Updated 6 months ago
- Verifiers for LLM Reinforcement Learning☆80Apr 15, 2025Updated 11 months ago
- This repository is contains several Automated feature selection methods in CTR Predicition.☆10Dec 18, 2022Updated 3 years ago
- ☆13Jul 12, 2024Updated last year
- Code of paper "HyperVLA: Efficient Inference in Vision-Language-Action Models via Hypernetworks"☆24Oct 8, 2025Updated 5 months ago
- ☆54Jan 30, 2024Updated 2 years ago
- FurNet: A Deep-Learning-Based Framework for Removing Furniture Objects in Room Image☆14Nov 22, 2022Updated 3 years ago
- Code for the paper "Spectrum Guided Topology Augmentation for Graph Contrastive Learning"☆11Jul 18, 2023Updated 2 years ago
- [ACCV 2024] Simple, Easy 3D Object Detection with Point-Wise Semantics☆15Oct 28, 2025Updated 4 months ago
- This is the official repo for Do LLM Modules Generalize? A Study on Motion Generation for Autonomous Driving. CoRL 2025☆19Oct 20, 2025Updated 5 months ago
- [COLM 2025] Code for Paper: Learning Adaptive Parallel Reasoning with Language Models☆142Dec 17, 2025Updated 3 months ago
- Simple RL training for reasoning☆3,841Dec 23, 2025Updated 3 months ago
- Organize the Web: Constructing Domains Enhances Pre-Training Data Curation☆79May 2, 2025Updated 10 months ago
- Exploring Applications of GRPO☆252Aug 25, 2025Updated 7 months ago
- Count based exploration with the successor representation for Unity ML's Pyramid☆12Jun 19, 2019Updated 6 years ago
- An adaptive training algorithm for residual network☆17Aug 22, 2020Updated 5 years ago
- Official repository for ORPO☆473May 31, 2024Updated last year
- Code and Data Repo for the CoNLL Paper -- Future Lens: Anticipating Subsequent Tokens from a Single Hidden State☆20Oct 24, 2025Updated 5 months ago
- Maximum Entropy-Regularized Multi-Goal Reinforcement Learning (ICML 2019)☆24May 30, 2019Updated 6 years ago
- ☆12Mar 4, 2025Updated last year
- TaskMet Task-driven Metric Learning for Model Learning☆20Feb 9, 2024Updated 2 years ago
- A framework for evaluating the effectiveness of chain-of-thought reasoning in language models.☆19Feb 6, 2025Updated last year
- The program ranked first in Audio-only track of DCASE2024 Challenge task3.☆20Mar 2, 2026Updated 3 weeks ago
- ☆12Jul 8, 2023Updated 2 years ago
- Plannable Approximations to MDP Homomorphisms: Equivariance under Actions☆30Jun 30, 2020Updated 5 years ago
- MuJoCo models for Unitree Robots☆12Nov 24, 2021Updated 4 years ago