mingyin0312 / RL4LLMLinks
RL significantly the reasoning capability of Qwen2.5-1.5B-Instruct
☆30Updated 7 months ago
Alternatives and similar repositories for RL4LLM
Users that are interested in RL4LLM are comparing it to the libraries listed below
Sorting:
- minimal GRPO implementation from scratch☆98Updated 6 months ago
- nanoGRPO is a lightweight implementation of Group Relative Policy Optimization (GRPO)☆120Updated 4 months ago
- ☆48Updated last year
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆172Updated 8 months ago
- Simple GRPO scripts and configurations.☆59Updated 7 months ago
- My fork os allen AI's OLMo for educational purposes.☆30Updated 9 months ago
- Train your own SOTA deductive reasoning model☆107Updated 6 months ago
- A Qwen .5B reasoning model trained on OpenR1-Math-220k☆14Updated 7 months ago
- Simple repository for training small reasoning models☆40Updated 7 months ago
- A simplified implementation for experimenting with RLVR on GSM8K, This repository provides a starting point for exploring reasoning.☆129Updated 7 months ago
- An extension of the nanoGPT repository for training small MOE models.☆195Updated 6 months ago
- Unofficial implementation of https://arxiv.org/pdf/2407.14679☆49Updated last year
- [EMNLP 2025] The official implementation for paper "Agentic-R1: Distilled Dual-Strategy Reasoning"☆100Updated last month
- ☆67Updated last year
- Collection of autoregressive model implementation☆86Updated 5 months ago
- EvolKit is an innovative framework designed to automatically enhance the complexity of instructions used for fine-tuning Large Language M…☆240Updated 11 months ago
- Tina: Tiny Reasoning Models via LoRA☆284Updated last week
- Set of scripts to finetune LLMs☆38Updated last year
- Exploring Applications of GRPO☆250Updated last month
- [EMNLP 2025 Industry] Repo for "Z1: Efficient Test-time Scaling with Code"☆64Updated 5 months ago
- Verifiers for LLM Reinforcement Learning☆74Updated 5 months ago
- ☆85Updated last year
- ☆123Updated 7 months ago
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind☆128Updated last year
- Implementation of CALM from the paper "LLM Augmented LLMs: Expanding Capabilities through Composition", out of Google Deepmind☆178Updated last year
- This is the official repository for Inheritune.☆113Updated 7 months ago
- an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)☆105Updated 6 months ago
- NanoGPT-speedrunning for the poor T4 enjoyers☆72Updated 5 months ago
- A pipeline for LLM knowledge distillation☆109Updated 6 months ago
- A framework to study AI models in Reasoning, Alignment, and use of Memory (RAM).☆290Updated last week