mingyin0312 / RL4LLM
RL significantly the reasoning capability of Qwen2.5-1.5B-Instruct
☆28Updated last month
Alternatives and similar repositories for RL4LLM:
Users that are interested in RL4LLM are comparing it to the libraries listed below
- Train your own SOTA deductive reasoning model☆86Updated last month
- ☆47Updated 7 months ago
- nanoGRPO is a lightweight implementation of Group Relative Policy Optimization (GRPO)☆98Updated last week
- Train, tune, and infer Bamba model☆88Updated 3 months ago
- Collection of autoregressive model implementation☆85Updated 2 months ago
- Simple GRPO scripts and configurations.☆58Updated 2 months ago
- Set of scripts to finetune LLMs☆37Updated last year
- ☆46Updated last week
- Code for NeurIPS LLM Efficiency Challenge☆57Updated last year
- minimal GRPO implementation from scratch☆72Updated last month
- Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems☆84Updated last month
- A simplified implementation for experimenting with Reinforcement Learning (RL) on GSM8K, inspired by RLVR and Deepseek R1. This repositor…☆75Updated 2 months ago
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆170Updated 3 months ago
- A pipeline for LLM knowledge distillation☆100Updated 2 weeks ago
- A Qwen .5B reasoning model trained on OpenR1-Math-220k☆13Updated last month
- Repo for "Z1: Efficient Test-time Scaling with Code"☆53Updated last week
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆55Updated 7 months ago
- ☆33Updated 10 months ago
- ☆77Updated 8 months ago
- A repository for research on medium sized language models.☆76Updated 10 months ago
- Code and data for the paper "Why think step by step? Reasoning emerges from the locality of experience"☆60Updated 2 weeks ago
- ☆52Updated last month
- This is the official repository for Inheritune.☆111Updated 2 months ago
- Official repo of paper LM2☆37Updated 2 months ago
- Triton Implementation of HyperAttention Algorithm☆47Updated last year
- NanoGPT-speedrunning for the poor T4 enjoyers☆61Updated 2 weeks ago
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆97Updated 6 months ago
- ☆45Updated 3 weeks ago
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆142Updated 7 months ago
- EvolKit is an innovative framework designed to automatically enhance the complexity of instructions used for fine-tuning Large Language M…☆211Updated 5 months ago