mingyin0312 / RL4LLMLinks
RL significantly the reasoning capability of Qwen2.5-1.5B-Instruct
☆30Updated 6 months ago
Alternatives and similar repositories for RL4LLM
Users that are interested in RL4LLM are comparing it to the libraries listed below
Sorting:
- minimal GRPO implementation from scratch☆96Updated 5 months ago
- nanoGRPO is a lightweight implementation of Group Relative Policy Optimization (GRPO)☆117Updated 3 months ago
- ☆48Updated last year
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆172Updated 7 months ago
- A simplified implementation for experimenting with RLVR on GSM8K, This repository provides a starting point for exploring reasoning.☆121Updated 6 months ago
- Collection of autoregressive model implementation☆86Updated 4 months ago
- Tina: Tiny Reasoning Models via LoRA☆278Updated 3 weeks ago
- An extension of the nanoGPT repository for training small MOE models.☆181Updated 5 months ago
- Verifiers for LLM Reinforcement Learning☆71Updated 4 months ago
- Exploring Applications of GRPO☆246Updated last week
- A Qwen .5B reasoning model trained on OpenR1-Math-220k☆14Updated 6 months ago
- Simple GRPO scripts and configurations.☆59Updated 6 months ago
- The official implementation for paper "Agentic-R1: Distilled Dual-Strategy Reasoning"☆94Updated this week
- Distributed training (multi-node) of a Transformer model☆80Updated last year
- Simple repository for training small reasoning models☆38Updated 6 months ago
- Fine-tunes a student LLM using teacher feedback for improved reasoning and answer quality. Implements GRPO with teacher-provided evaluati…☆46Updated 3 months ago
- The Automated LLM Speedrunning Benchmark measures how well LLM agents can reproduce previous innovations and discover new ones in languag…☆96Updated last month
- Training an LLM to use a calculator with multi-turn reinforcement learning, achieving a **62% absolute increase in evaluation accuracy**.☆47Updated 3 months ago
- Train your own SOTA deductive reasoning model☆104Updated 5 months ago
- MatFormer repo☆62Updated 8 months ago
- an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)☆105Updated 5 months ago
- My fork os allen AI's OLMo for educational purposes.☆30Updated 8 months ago
- EvolKit is an innovative framework designed to automatically enhance the complexity of instructions used for fine-tuning Large Language M…☆239Updated 10 months ago
- Set of scripts to finetune LLMs☆37Updated last year
- ☆121Updated 6 months ago
- This is the official repository for Inheritune.☆112Updated 6 months ago
- [ACL 2025] Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems☆104Updated 2 months ago
- ☆88Updated last year
- ☆51Updated last year
- Unofficial implementation of https://arxiv.org/pdf/2407.14679☆48Updated 11 months ago