mingyin0312 / RLFromScratchLinks
☆395Updated this week
Alternatives and similar repositories for RLFromScratch
Users that are interested in RLFromScratch are comparing it to the libraries listed below
Sorting:
- Single File, Single GPU, From Scratch, Efficient, Full Parameter Tuning library for "RL for LLMs"☆516Updated last month
- Tina: Tiny Reasoning Models via LoRA☆275Updated last week
- Exploring Applications of GRPO☆246Updated last month
- minimal GRPO implementation from scratch☆96Updated 5 months ago
- An extension of the nanoGPT repository for training small MOE models.☆178Updated 5 months ago
- rl from zero pretrain, can it be done? yes.☆250Updated last week
- Physics of Language Models, Part 4☆232Updated 3 weeks ago
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…☆344Updated 8 months ago
- Training teachers with reinforcement learning able to make LLMs learn how to reason for test time scaling.☆328Updated 2 months ago
- nanoGRPO is a lightweight implementation of Group Relative Policy Optimization (GRPO)☆116Updated 3 months ago
- Scalable toolkit for efficient model reinforcement☆626Updated last week
- Chain of Experts (CoE) enables communication between experts within Mixture-of-Experts (MoE) models☆220Updated last month
- Simple & Scalable Pretraining for Neural Architecture Research☆287Updated 2 weeks ago
- Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs☆188Updated 2 months ago
- Decentralized RL Training at Scale☆441Updated this week
- Parallel Scaling Law for Language Model — Beyond Parameter and Inference Time Scaling☆429Updated 3 months ago
- Normalized Transformer (nGPT)☆186Updated 9 months ago
- Nano repo for RL training of LLMs☆63Updated 2 weeks ago
- ☆211Updated 6 months ago
- Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation☆405Updated 2 weeks ago
- A collection of tricks and tools to speed up transformer models☆169Updated 2 months ago
- Minimal hackable GRPO implementation☆281Updated 6 months ago
- Official Implementation for the paper "d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning"☆282Updated last month
- FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.☆250Updated 2 weeks ago
- ☆380Updated this week
- The official GitHub repo for the survey paper "A Survey on Diffusion Language Models".☆56Updated last week
- The Automated LLM Speedrunning Benchmark measures how well LLM agents can reproduce previous innovations and discover new ones in languag…☆97Updated 3 weeks ago
- A framework to study AI models in Reasoning, Alignment, and use of Memory (RAM).☆272Updated this week
- Esoteric Language Models☆94Updated 3 weeks ago
- Official repo of paper LM2☆41Updated 6 months ago