Mohammadjafari80 / GSM8K-RLVR
A simplified implementation for experimenting with Reinforcement Learning (RL) on GSM8K, inspired by RLVR and Deepseek R1. This repository provides a starting point for exploring RL-based reasoning.
☆72Updated last month
Alternatives and similar repositories for GSM8K-RLVR:
Users that are interested in GSM8K-RLVR are comparing it to the libraries listed below
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆168Updated 2 months ago
- ☆111Updated last month
- ☆103Updated 2 months ago
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks☆140Updated this week
- ☆188Updated last month
- ☆96Updated 9 months ago
- Code for "Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate"☆131Updated last month
- ☆119Updated 5 months ago
- ☆65Updated 4 months ago
- Complex Function Calling Benchmark.☆85Updated 2 months ago
- ☆160Updated 3 weeks ago
- Repository for the paper Stream of Search: Learning to Search in Language☆142Updated last month
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆55Updated 7 months ago
- Benchmarking LLMs with Challenging Tasks from Real Users☆219Updated 4 months ago
- official implementation of paper "Process Reward Model with Q-value Rankings"☆51Updated last month
- Self-playing Adversarial Language Game Enhances LLM Reasoning, NeurIPS 2024☆124Updated last month
- Flow of Reasoning: Training LLMs for Divergent Problem Solving with Minimal Examples☆80Updated this week
- Systematic evaluation framework that automatically rates overthinking behavior in large language models.☆82Updated this week
- nanoGRPO is a lightweight implementation of Group Relative Policy Optimization (GRPO)☆91Updated this week
- LongEmbed: Extending Embedding Models for Long Context Retrieval (EMNLP 2024)☆131Updated 4 months ago
- Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling☆95Updated 2 months ago
- Reproducible, flexible LLM evaluations☆180Updated this week
- ☆262Updated 2 weeks ago
- Train your own SOTA deductive reasoning model☆81Updated 3 weeks ago
- This is the official repository for Inheritune.☆111Updated last month
- Framework and toolkits for building and evaluating collaborative agents that can work together with humans.☆70Updated last month
- Reformatted Alignment☆115Updated 6 months ago
- augmented LLM with self reflection☆117Updated last year
- Implementation of 🥥 Coconut, Chain of Continuous Thought, in Pytorch☆161Updated 2 months ago
- ☆52Updated 2 weeks ago