mingyin0312 / RL4LLMLinks
RL significantly the reasoning capability of Qwen2.5-1.5B-Instruct
☆31Updated 10 months ago
Alternatives and similar repositories for RL4LLM
Users that are interested in RL4LLM are comparing it to the libraries listed below
Sorting:
- nanoGRPO is a lightweight implementation of Group Relative Policy Optimization (GRPO)☆138Updated 7 months ago
- minimal GRPO implementation from scratch☆100Updated 9 months ago
- Tina: Tiny Reasoning Models via LoRA☆310Updated 3 months ago
- A Qwen .5B reasoning model trained on OpenR1-Math-220k☆14Updated 2 months ago
- ☆48Updated last year
- An extension of the nanoGPT repository for training small MOE models.☆219Updated 9 months ago
- Exploring Applications of GRPO☆250Updated 4 months ago
- Simple repository for training small reasoning models☆47Updated 10 months ago
- EvolKit is an innovative framework designed to automatically enhance the complexity of instructions used for fine-tuning Large Language M…☆245Updated last year
- Collection of autoregressive model implementation☆85Updated 8 months ago
- Simple GRPO scripts and configurations.☆59Updated 10 months ago
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆174Updated 11 months ago
- ☆52Updated last year
- ☆45Updated 7 months ago
- A simplified implementation for experimenting with RLVR on GSM8K, This repository provides a starting point for exploring reasoning.☆151Updated 10 months ago
- ☆86Updated last year
- Unofficial implementation of https://arxiv.org/pdf/2407.14679☆53Updated last year
- Verifiers for LLM Reinforcement Learning☆80Updated 8 months ago
- Train your own SOTA deductive reasoning model☆107Updated 9 months ago
- My fork os allen AI's OLMo for educational purposes.☆30Updated last year
- Training an LLM to use a calculator with multi-turn reinforcement learning, achieving a **62% absolute increase in evaluation accuracy**.☆64Updated 7 months ago
- [EMNLP 2025] The official implementation for paper "Agentic-R1: Distilled Dual-Strategy Reasoning"☆100Updated 3 months ago
- A pipeline for LLM knowledge distillation☆111Updated 8 months ago
- an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)☆108Updated 9 months ago
- ☆68Updated last year
- A compact LLM pretrained in 9 days by using high quality data☆337Updated 8 months ago
- Set of scripts to finetune LLMs☆38Updated last year
- The Automated LLM Speedrunning Benchmark measures how well LLM agents can reproduce previous innovations and discover new ones in languag…☆122Updated 2 months ago
- Esoteric Language Models☆108Updated last month
- An overview of GRPO & DeepSeek-R1 Training with Open Source GRPO Model Fine Tuning☆37Updated 7 months ago