Mohammadjafari80 / GSM8K-RLVRView external linksLinks
A simplified implementation for experimenting with RLVR on GSM8K, This repository provides a starting point for exploring reasoning.
☆162Feb 6, 2025Updated last year
Alternatives and similar repositories for GSM8K-RLVR
Users that are interested in GSM8K-RLVR are comparing it to the libraries listed below
Sorting:
- Motion imitation with deep reinforcement learning.☆13Jul 24, 2019Updated 6 years ago
- Official PyTorch Implementation for Metric Residual Networks for Sample Efficient Goal-Conditioned Reinforcement Learning☆19Jan 11, 2023Updated 3 years ago
- A multi-agent framework to help with your homework.☆10Mar 1, 2025Updated 11 months ago
- React application using Segment Anything in browser☆10Oct 9, 2023Updated 2 years ago
- ☆11Oct 19, 2020Updated 5 years ago
- Improving upon state of the art cooperative deep reinforcement learning in StarCraft II☆13May 16, 2019Updated 6 years ago
- ☆27Mar 13, 2024Updated last year
- Code for the paper Alpha Zero in Continuous Action Space (A0C) (https://arxiv.org/pdf/1805.09613.pdf)☆15Jan 19, 2021Updated 5 years ago
- MuJoCo models for Unitree Robots☆12Nov 24, 2021Updated 4 years ago
- 🧪categorical tabnet research part🧪☆13Apr 12, 2024Updated last year
- Plannable Approximations to MDP Homomorphisms: Equivariance under Actions☆30Jun 30, 2020Updated 5 years ago
- Maximum Entropy-Regularized Multi-Goal Reinforcement Learning (ICML 2019)☆24May 30, 2019Updated 6 years ago
- Count based exploration with the successor representation for Unity ML's Pyramid☆12Jun 19, 2019Updated 6 years ago
- RLVR Testing and Training☆23Aug 28, 2025Updated 5 months ago
- ☆13Jan 27, 2019Updated 7 years ago
- This is the code for GA-DRL-Aubo paper☆14Apr 8, 2022Updated 3 years ago
- Code for minimum-entropy coupling.☆32Jan 6, 2026Updated last month
- LinChance Fine-tuning System 采用 Streamlit 结合 LLaMA-Factory 打造的模型微调 Web UI☆14Feb 4, 2024Updated 2 years ago
- 3D learning environment with rigid body simulation for Linux/MacOSX☆14Dec 24, 2021Updated 4 years ago
- This package is an implementation of Dexterous Ungrasping, which refers to the task of securely transferring an object from the gripper t…☆12Feb 4, 2022Updated 4 years ago
- [ACL 2025] Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems☆125Jun 11, 2025Updated 8 months ago
- Collection of LLM completions for reasoning-gym task datasets☆30Jul 4, 2025Updated 7 months ago
- various experiments for scaling inference time compute with small reasoning models☆17Jan 16, 2025Updated last year
- MImE - Manipulation Imitation Environments☆14Feb 1, 2022Updated 4 years ago
- ☆19May 19, 2024Updated last year
- Smart commit messages☆18Oct 25, 2024Updated last year
- ☆17Mar 2, 2024Updated last year
- 🥇 LG-AI-Challenge 2022 1위 솔루션 입니다.☆13Jun 6, 2023Updated 2 years ago
- Open source code combining implementations of Upside Down Reinforcement Learning and Reward Conditioned Policies☆19Mar 10, 2021Updated 4 years ago
- ☆11Updated this week
- Representation Learning in RL☆13Jun 1, 2022Updated 3 years ago
- ☆21Jan 19, 2024Updated 2 years ago
- The code of paper "Toward Optimal LLM Alignments Using Two-Player Games".☆17Jun 20, 2024Updated last year
- Implementation for "ROLL: Visual Self-Supervised Reinforcement Learning with Object Reasoning", CoRL 2020☆16Jun 22, 2022Updated 3 years ago
- RO-ViT CVPR 2023 "Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers"☆17Aug 24, 2023Updated 2 years ago
- An adaptive training algorithm for residual network☆17Aug 22, 2020Updated 5 years ago
- ☆18Feb 7, 2021Updated 5 years ago
- GPT implementation in Flax☆18Jan 8, 2022Updated 4 years ago
- TaskMet Task-driven Metric Learning for Model Learning☆20Feb 9, 2024Updated 2 years ago