mingyin0312 / RL4LLMLinks
RL significantly the reasoning capability of Qwen2.5-1.5B-Instruct
☆29Updated 3 months ago
Alternatives and similar repositories for RL4LLM
Users that are interested in RL4LLM are comparing it to the libraries listed below
Sorting:
- ☆47Updated 9 months ago
- nanoGRPO is a lightweight implementation of Group Relative Policy Optimization (GRPO)☆105Updated 3 weeks ago
- minimal GRPO implementation from scratch☆90Updated 2 months ago
- Simple GRPO scripts and configurations.☆58Updated 4 months ago
- Collection of autoregressive model implementation☆85Updated last month
- Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems☆90Updated 2 months ago
- Verifiers for LLM Reinforcement Learning☆56Updated last month
- A simplified implementation for experimenting with RLVR on GSM8K, This repository provides a starting point for exploring reasoning.☆95Updated 3 months ago
- Unofficial implementation of https://arxiv.org/pdf/2407.14679☆44Updated 8 months ago
- A Qwen .5B reasoning model trained on OpenR1-Math-220k☆14Updated 3 months ago
- Repo for "Z1: Efficient Test-time Scaling with Code"☆59Updated last month
- DPO, but faster 🚀☆42Updated 5 months ago
- Official repo of paper LM2☆40Updated 3 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆57Updated 9 months ago
- An overview of GRPO & DeepSeek-R1 Training with Open Source GRPO Model Fine Tuning☆32Updated 2 weeks ago
- Code for "Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate"☆151Updated last month
- working implimention of deepseek MLA☆41Updated 4 months ago
- A repository for research on medium sized language models.☆76Updated last year
- Simple repository for training small reasoning models☆31Updated 3 months ago
- ☆35Updated last week
- ☆39Updated last month
- ☆59Updated 10 months ago
- ☆79Updated 9 months ago
- General Reasoner: Advancing LLM Reasoning Across All Domains☆126Updated last week
- Tina: Tiny Reasoning Models via LoRA☆245Updated last week
- An extension of the nanoGPT repository for training small MOE models.☆147Updated 2 months ago
- minimal LLM scripts for 24GB VRAM GPUs. training, inference, whatever☆39Updated last week
- Code for EMNLP 2024 paper "Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning"☆54Updated 8 months ago
- Repository for the Q-Filters method (https://arxiv.org/pdf/2503.02812)☆31Updated 2 months ago
- Set of scripts to finetune LLMs☆37Updated last year