mingyin0312 / RL4LLMLinks
RL significantly the reasoning capability of Qwen2.5-1.5B-Instruct
☆31Updated 8 months ago
Alternatives and similar repositories for RL4LLM
Users that are interested in RL4LLM are comparing it to the libraries listed below
Sorting:
- ☆48Updated last year
- Simple GRPO scripts and configurations.☆59Updated 9 months ago
- A Qwen .5B reasoning model trained on OpenR1-Math-220k☆14Updated last month
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆173Updated 10 months ago
- minimal GRPO implementation from scratch☆99Updated 8 months ago
- nanoGRPO is a lightweight implementation of Group Relative Policy Optimization (GRPO)☆125Updated 6 months ago
- Simple repository for training small reasoning models☆45Updated 9 months ago
- Collection of autoregressive model implementation☆86Updated 6 months ago
- Unofficial implementation of https://arxiv.org/pdf/2407.14679☆50Updated last year
- Tina: Tiny Reasoning Models via LoRA☆304Updated last month
- EvolKit is an innovative framework designed to automatically enhance the complexity of instructions used for fine-tuning Large Language M…☆242Updated last year
- Set of scripts to finetune LLMs☆38Updated last year
- ☆46Updated 7 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆60Updated last year
- Verifiers for LLM Reinforcement Learning☆79Updated 7 months ago
- [EMNLP 2025] The official implementation for paper "Agentic-R1: Distilled Dual-Strategy Reasoning"☆101Updated 2 months ago
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)☆162Updated 7 months ago
- ☆98Updated 7 months ago
- This is the official repository for Inheritune.☆115Updated 9 months ago
- A simplified implementation for experimenting with RLVR on GSM8K, This repository provides a starting point for exploring reasoning.☆144Updated 9 months ago
- ☆86Updated last year
- Train your own SOTA deductive reasoning model☆108Updated 8 months ago
- [ACL 2025] Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems☆109Updated 5 months ago
- ☆124Updated 8 months ago
- Exploring Applications of GRPO☆248Updated 2 months ago
- ☆119Updated last year
- A repository for research on medium sized language models.☆78Updated last year
- ☆68Updated last year
- The Automated LLM Speedrunning Benchmark measures how well LLM agents can reproduce previous innovations and discover new ones in languag…☆111Updated last month
- Codebase accompanying the Summary of a Haystack paper.☆79Updated last year