fangyuan-ksgk / Tiny-GRPO
minimal GRPO implementation from scratch
☆87Updated last month
Alternatives and similar repositories for Tiny-GRPO:
Users that are interested in Tiny-GRPO are comparing it to the libraries listed below
- An extension of the nanoGPT repository for training small MOE models.☆138Updated 2 months ago
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆172Updated 3 months ago
- nanoGRPO is a lightweight implementation of Group Relative Policy Optimization (GRPO)☆103Updated 3 weeks ago
- ☆181Updated 2 months ago
- A simplified implementation for experimenting with Reinforcement Learning (RL) on GSM8K, inspired by RLVR and Deepseek R1. This repositor…☆84Updated 3 months ago
- Implementation of 🥥 Coconut, Chain of Continuous Thought, in Pytorch☆165Updated 4 months ago
- Prune transformer layers☆69Updated 11 months ago
- Exploring Applications of GRPO☆189Updated this week
- Official repository for paper "ReasonIR Training Retrievers for Reasoning Tasks".☆112Updated last week
- ☆47Updated 8 months ago
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)☆153Updated 3 weeks ago
- This is the official repository for Inheritune.☆111Updated 2 months ago
- Tina: Tiny Reasoning Models via LoRA☆164Updated 2 weeks ago
- Code for "Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate"☆141Updated 2 weeks ago
- ☆50Updated 11 months ago
- PyTorch building blocks for the OLMo ecosystem☆205Updated this week
- Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems☆90Updated 2 months ago
- RL significantly the reasoning capability of Qwen2.5-1.5B-Instruct☆28Updated 2 months ago
- Simple extension on vLLM to help you speed up reasoning model without training.☆148Updated this week
- Train your own SOTA deductive reasoning model☆91Updated 2 months ago
- The official repo for "LLoCo: Learning Long Contexts Offline"☆116Updated 10 months ago
- A scalable asynchronous reinforcement learning implementation with in-flight weight updates.☆100Updated this week
- ☆92Updated 7 months ago
- A framework to study AI models in Reasoning, Alignment, and use of Memory (RAM).☆226Updated this week
- Unofficial Implementation of Evolutionary Model Merging☆38Updated last year
- Simple repository for training small reasoning models☆27Updated 3 months ago
- Unofficial implementation of https://arxiv.org/pdf/2407.14679☆44Updated 8 months ago
- official repository for “Reinforcement Learning for Reasoning in Large Language Models with One Training Example”☆118Updated this week
- 🌾 OAT: A research-friendly framework for LLM online alignment, including preference learning, reinforcement learning, etc.☆338Updated this week
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆143Updated 7 months ago