mingyin0312 / RL4LLMLinks
RL significantly the reasoning capability of Qwen2.5-1.5B-Instruct
☆31Updated 11 months ago
Alternatives and similar repositories for RL4LLM
Users that are interested in RL4LLM are comparing it to the libraries listed below
Sorting:
- minimal GRPO implementation from scratch☆102Updated 10 months ago
- ☆48Updated last year
- nanoGRPO is a lightweight implementation of Group Relative Policy Optimization (GRPO)☆143Updated 9 months ago
- Unofficial implementation of https://arxiv.org/pdf/2407.14679☆53Updated last year
- A Qwen .5B reasoning model trained on OpenR1-Math-220k☆14Updated 4 months ago
- Simple repository for training small reasoning models☆49Updated last year
- Simple GRPO scripts and configurations.☆59Updated last year
- A simplified implementation for experimenting with RLVR on GSM8K, This repository provides a starting point for exploring reasoning.☆162Updated last year
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆175Updated last year
- EvolKit is an innovative framework designed to automatically enhance the complexity of instructions used for fine-tuning Large Language M…☆250Updated last year
- [ICLR 2026] Tina: Tiny Reasoning Models via LoRA☆319Updated 4 months ago
- Verifiers for LLM Reinforcement Learning☆80Updated 9 months ago
- The Automated LLM Speedrunning Benchmark measures how well LLM agents can reproduce previous innovations and discover new ones in languag…☆128Updated 4 months ago
- Collection of autoregressive model implementation☆85Updated this week
- An extension of the nanoGPT repository for training small MOE models.☆236Updated 11 months ago
- Set of scripts to finetune LLMs☆38Updated last year
- My fork os allen AI's OLMo for educational purposes.☆29Updated last year
- A collection of lightweight interpretability scripts to understand how LLMs think☆89Updated this week
- Universal Reasoning Model☆122Updated 3 weeks ago
- A pipeline for LLM knowledge distillation☆112Updated 10 months ago
- ☆86Updated 2 years ago
- Train your own SOTA deductive reasoning model☆107Updated 11 months ago
- ☆52Updated last year
- ☆46Updated 10 months ago
- [EMNLP 2025] The official implementation for paper "Agentic-R1: Distilled Dual-Strategy Reasoning"☆102Updated 5 months ago
- ☆63Updated last year
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆61Updated last year
- Official codebase for "Quantile Reward Policy Optimization: Alignment with Pointwise Regression and Exact Partition Functions" (Matrenok …☆30Updated 2 months ago
- Exploring Applications of GRPO☆251Updated 5 months ago
- Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna☆59Updated 3 months ago