mingyin0312 / RL4LLMLinks
RL significantly the reasoning capability of Qwen2.5-1.5B-Instruct
☆29Updated 4 months ago
Alternatives and similar repositories for RL4LLM
Users that are interested in RL4LLM are comparing it to the libraries listed below
Sorting:
- minimal GRPO implementation from scratch☆92Updated 4 months ago
- ☆48Updated 10 months ago
- A simplified implementation for experimenting with RLVR on GSM8K, This repository provides a starting point for exploring reasoning.☆112Updated 5 months ago
- Tina: Tiny Reasoning Models via LoRA☆266Updated last month
- Exploring Applications of GRPO☆240Updated this week
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆173Updated 6 months ago
- nanoGRPO is a lightweight implementation of Group Relative Policy Optimization (GRPO)☆108Updated 2 months ago
- EvolKit is an innovative framework designed to automatically enhance the complexity of instructions used for fine-tuning Large Language M…☆229Updated 8 months ago
- Collection of autoregressive model implementation☆85Updated 2 months ago
- Simple GRPO scripts and configurations.☆59Updated 5 months ago
- Repo for "Z1: Efficient Test-time Scaling with Code"☆63Updated 3 months ago
- ☆62Updated 11 months ago
- [ACL 2025] Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems☆95Updated last month
- An extension of the nanoGPT repository for training small MOE models.☆160Updated 4 months ago
- ☆94Updated 3 months ago
- ☆56Updated 7 months ago
- Official repo of paper LM2☆41Updated 5 months ago
- A Qwen .5B reasoning model trained on OpenR1-Math-220k☆14Updated 4 months ago
- The Automated LLM Speedrunning Benchmark measures how well LLM agents can reproduce previous innovations and discover new ones in languag…☆87Updated 2 weeks ago
- Unofficial implementation of https://arxiv.org/pdf/2407.14679☆45Updated 10 months ago
- My fork os allen AI's OLMo for educational purposes.☆30Updated 7 months ago
- Verifiers for LLM Reinforcement Learning☆64Updated 3 months ago
- This is the official repository for Inheritune.☆111Updated 5 months ago
- Code for NeurIPS LLM Efficiency Challenge☆59Updated last year
- Train your own SOTA deductive reasoning model☆96Updated 4 months ago
- Official Code Repository for the paper "Distilling LLM Agent into Small Models with Retrieval and Code Tools"☆115Updated last month
- rl from zero pretrain, can it be done? we'll see.☆65Updated 3 weeks ago
- ☆87Updated last year
- ☆52Updated 8 months ago
- ☆124Updated 9 months ago