mingyin0312 / RL4LLM
RL significantly the reasoning capability of Qwen2.5-1.5B-Instruct
☆28Updated 2 months ago
Alternatives and similar repositories for RL4LLM
Users that are interested in RL4LLM are comparing it to the libraries listed below
Sorting:
- ☆47Updated 8 months ago
- Collection of autoregressive model implementation☆85Updated 3 weeks ago
- Simple GRPO scripts and configurations.☆58Updated 3 months ago
- Simple repository for training small reasoning models☆27Updated 3 months ago
- Set of scripts to finetune LLMs☆37Updated last year
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆172Updated 3 months ago
- nanoGRPO is a lightweight implementation of Group Relative Policy Optimization (GRPO)☆103Updated last week
- My fork os allen AI's OLMo for educational purposes.☆30Updated 5 months ago
- Lean implementation of various multi-agent LLM methods, including Iteration of Thought (IoT)☆110Updated 3 months ago
- minimal GRPO implementation from scratch☆88Updated 2 months ago
- Train your own SOTA deductive reasoning model☆92Updated 2 months ago
- ☆48Updated 6 months ago
- Verifiers for LLM Reinforcement Learning☆18Updated 3 weeks ago
- A Qwen .5B reasoning model trained on OpenR1-Math-220k☆14Updated 2 months ago
- ☆64Updated last month
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆57Updated 8 months ago
- ☆84Updated last week
- A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM☆54Updated last year
- ☆58Updated 9 months ago
- Code for paper called Self-Training Elicits Concise Reasoning in Large Language Models☆30Updated 3 weeks ago
- Repo hosting codes and materials related to speeding LLMs' inference using token merging.☆36Updated last year
- working implimention of deepseek MLA☆41Updated 4 months ago
- Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems☆90Updated 2 months ago
- ☆27Updated last month
- Code for NeurIPS LLM Efficiency Challenge☆57Updated last year
- ☆114Updated 2 months ago
- A simplified implementation for experimenting with Reinforcement Learning (RL) on GSM8K, inspired by RLVR and Deepseek R1. This repositor…☆87Updated 3 months ago
- Small and Efficient Mathematical Reasoning LLMs☆71Updated last year
- Exploring Applications of GRPO☆206Updated last week
- Repo for "Z1: Efficient Test-time Scaling with Code"☆58Updated last month