mingyin0312 / RL4LLMLinks
RL significantly the reasoning capability of Qwen2.5-1.5B-Instruct
☆31Updated 9 months ago
Alternatives and similar repositories for RL4LLM
Users that are interested in RL4LLM are comparing it to the libraries listed below
Sorting:
- minimal GRPO implementation from scratch☆100Updated 8 months ago
- ☆48Updated last year
- nanoGRPO is a lightweight implementation of Group Relative Policy Optimization (GRPO)☆126Updated 6 months ago
- Collection of autoregressive model implementation☆85Updated 7 months ago
- [EMNLP 2025] The official implementation for paper "Agentic-R1: Distilled Dual-Strategy Reasoning"☆100Updated 3 months ago
- Simple repository for training small reasoning models☆46Updated 10 months ago
- Unofficial implementation of https://arxiv.org/pdf/2407.14679☆51Updated last year
- My fork os allen AI's OLMo for educational purposes.☆30Updated last year
- Simple GRPO scripts and configurations.☆59Updated 10 months ago
- Tina: Tiny Reasoning Models via LoRA☆309Updated 2 months ago
- ☆46Updated 8 months ago
- Verifiers for LLM Reinforcement Learning☆80Updated 7 months ago
- Exploring Applications of GRPO☆249Updated 3 months ago
- An extension of the nanoGPT repository for training small MOE models.☆215Updated 8 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆60Updated last year
- A minimal implementation of LLaVA-style VLM with interleaved image & text & video processing ability.☆96Updated 11 months ago
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆173Updated 10 months ago
- ☆52Updated last year
- Official codebase for "Quantile Reward Policy Optimization: Alignment with Pointwise Regression and Exact Partition Functions" (Matrenok …☆27Updated last month
- Esoteric Language Models☆107Updated last week
- ☆89Updated last year
- A Qwen .5B reasoning model trained on OpenR1-Math-220k☆14Updated last month
- Code for NeurIPS LLM Efficiency Challenge☆59Updated last year
- ☆124Updated 9 months ago
- EvolKit is an innovative framework designed to automatically enhance the complexity of instructions used for fine-tuning Large Language M…☆243Updated last year
- A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM☆61Updated last year
- A simplified implementation for experimenting with RLVR on GSM8K, This repository provides a starting point for exploring reasoning.☆145Updated 10 months ago
- Chain of Experts (CoE) enables communication between experts within Mixture-of-Experts (MoE) models☆223Updated last month
- The Automated LLM Speedrunning Benchmark measures how well LLM agents can reproduce previous innovations and discover new ones in languag…☆112Updated last month
- A collection of lightweight interpretability scripts to understand how LLMs think☆68Updated last week