mingyin0312 / RL4LLMLinks
RL significantly the reasoning capability of Qwen2.5-1.5B-Instruct
☆31Updated 10 months ago
Alternatives and similar repositories for RL4LLM
Users that are interested in RL4LLM are comparing it to the libraries listed below
Sorting:
- ☆48Updated last year
- minimal GRPO implementation from scratch☆102Updated 10 months ago
- Simple GRPO scripts and configurations.☆59Updated 11 months ago
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆174Updated last year
- Simple repository for training small reasoning models☆47Updated 11 months ago
- Tina: Tiny Reasoning Models via LoRA☆314Updated 3 months ago
- Train your own SOTA deductive reasoning model☆107Updated 10 months ago
- A Qwen .5B reasoning model trained on OpenR1-Math-220k☆13Updated 3 months ago
- EvolKit is an innovative framework designed to automatically enhance the complexity of instructions used for fine-tuning Large Language M…☆244Updated last year
- Unofficial implementation of https://arxiv.org/pdf/2407.14679☆53Updated last year
- nanoGRPO is a lightweight implementation of Group Relative Policy Optimization (GRPO)☆140Updated 8 months ago
- Verifiers for LLM Reinforcement Learning☆79Updated 9 months ago
- A simplified implementation for experimenting with RLVR on GSM8K, This repository provides a starting point for exploring reasoning.☆154Updated 11 months ago
- ☆52Updated last year
- An extension of the nanoGPT repository for training small MOE models.☆225Updated 10 months ago
- A collection of lightweight interpretability scripts to understand how LLMs think☆88Updated this week
- My fork os allen AI's OLMo for educational purposes.☆30Updated last year
- Collection of autoregressive model implementation☆85Updated this week
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆61Updated last year
- Exploring Applications of GRPO☆252Updated 4 months ago
- ☆86Updated last year
- an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)☆109Updated 10 months ago
- ☆123Updated 10 months ago
- This is the official repository for Inheritune.☆119Updated 11 months ago
- Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna☆59Updated 2 months ago
- [EMNLP 2025] The official implementation for paper "Agentic-R1: Distilled Dual-Strategy Reasoning"☆102Updated 4 months ago
- A pipeline for LLM knowledge distillation☆112Updated 9 months ago
- Code for NeurIPS LLM Efficiency Challenge☆59Updated last year
- The Automated LLM Speedrunning Benchmark measures how well LLM agents can reproduce previous innovations and discover new ones in languag…☆126Updated 3 months ago
- ☆91Updated last year