Mohammadjafari80 / GSM8K-RLVR
A simplified implementation for experimenting with Reinforcement Learning (RL) on GSM8K, inspired by RLVR and Deepseek R1. This repository provides a starting point for exploring RL-based reasoning.
☆78Updated 2 months ago
Alternatives and similar repositories for GSM8K-RLVR:
Users that are interested in GSM8K-RLVR are comparing it to the libraries listed below
- ☆107Updated 3 months ago
- minimal GRPO implementation from scratch☆85Updated last month
- LongEmbed: Extending Embedding Models for Long Context Retrieval (EMNLP 2024)☆133Updated 5 months ago
- Flow of Reasoning: Training LLMs for Divergent Problem Solving with Minimal Examples☆84Updated last month
- Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems☆86Updated last month
- ☆96Updated 9 months ago
- ☆114Updated 2 months ago
- ☆120Updated 6 months ago
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆171Updated 3 months ago
- Code for "Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate"☆139Updated this week
- Repository for the paper Stream of Search: Learning to Search in Language☆145Updated 2 months ago
- official implementation of paper "Process Reward Model with Q-value Rankings"☆56Updated 2 months ago
- Replicating O1 inference-time scaling laws☆83Updated 4 months ago
- Complex Function Calling Benchmark.☆98Updated 3 months ago
- ☆166Updated last week
- Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Fl…☆72Updated 8 months ago
- augmented LLM with self reflection☆119Updated last year
- EvolKit is an innovative framework designed to automatically enhance the complexity of instructions used for fine-tuning Large Language M…☆213Updated 5 months ago
- The official evaluation suite and dynamic data release for MixEval.☆235Updated 5 months ago
- ☆57Updated last month
- ☆70Updated 5 months ago
- Train your own SOTA deductive reasoning model☆88Updated last month
- ☆117Updated 7 months ago
- Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling☆101Updated 3 months ago
- ☆55Updated 2 weeks ago
- Official repo for paper: "Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't"☆204Updated last month
- nanoGRPO is a lightweight implementation of Group Relative Policy Optimization (GRPO)☆100Updated last week
- Benchmarking LLMs with Challenging Tasks from Real Users☆221Updated 5 months ago
- Repo for "Z1: Efficient Test-time Scaling with Code"☆55Updated 2 weeks ago
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks☆182Updated last week