mingyin0312 / RLFromScratchLinks
☆437Updated last month
Alternatives and similar repositories for RLFromScratch
Users that are interested in RLFromScratch are comparing it to the libraries listed below
Sorting:
- Single File, Single GPU, From Scratch, Efficient, Full Parameter Tuning library for "RL for LLMs"☆535Updated 2 months ago
- Tina: Tiny Reasoning Models via LoRA☆284Updated 2 weeks ago
- rl from zero pretrain, can it be done? yes.☆274Updated last week
- FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.☆290Updated 2 months ago
- Exploring Applications of GRPO☆250Updated last month
- ☆773Updated 3 weeks ago
- ☆222Updated this week
- Physics of Language Models, Part 4☆247Updated 2 months ago
- An extension of the nanoGPT repository for training small MOE models.☆195Updated 6 months ago
- minimal GRPO implementation from scratch☆98Updated 6 months ago
- Nano repo for RL training of LLMs☆66Updated last month
- Post-training with Tinker☆550Updated this week
- Scalable toolkit for efficient model reinforcement☆910Updated this week
- Simple & Scalable Pretraining for Neural Architecture Research☆296Updated last month
- Checkpoint-engine is a simple middleware to update model weights in LLM inference engines☆751Updated last week
- Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs☆192Updated 3 months ago
- [NeurIPS 2025] Reinforcement Learning for Reasoning in Large Language Models with One Training Example☆360Updated last week
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…☆343Updated 9 months ago
- Async RL Training at Scale☆669Updated this week
- Chain of Experts (CoE) enables communication between experts within Mixture-of-Experts (MoE) models☆220Updated 3 weeks ago
- A scalable asynchronous reinforcement learning implementation with in-flight weight updates.☆151Updated this week
- Minimal hackable GRPO implementation☆286Updated 8 months ago
- Speed Always Wins: A Survey on Efficient Architectures for Large Language Models☆337Updated last month
- A framework to study AI models in Reasoning, Alignment, and use of Memory (RAM).☆290Updated last week
- Dion optimizer algorithm☆360Updated last week
- Build your own visual reasoning model☆409Updated last month
- Open-source framework for the research and development of foundation models.☆466Updated this week
- Esoteric Language Models☆99Updated 2 months ago
- nanoGRPO is a lightweight implementation of Group Relative Policy Optimization (GRPO)☆120Updated 4 months ago
- Training teachers with reinforcement learning able to make LLMs learn how to reason for test time scaling.☆343Updated 3 months ago