mingyin0312 / RLFromScratchLinks
☆465Updated 5 months ago
Alternatives and similar repositories for RLFromScratch
Users that are interested in RLFromScratch are comparing it to the libraries listed below
Sorting:
- Single File, Single GPU, From Scratch, Efficient, Full Parameter Tuning library for "RL for LLMs"☆589Updated 3 months ago
- Physics of Language Models: Part 4.2, Canon Layers at Scale where Synthetic Pretraining Resonates in Reality☆316Updated last month
- ☆230Updated 2 months ago
- FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.☆334Updated 3 months ago
- rl from zero pretrain, can it be done? yes.☆286Updated 4 months ago
- An extension of the nanoGPT repository for training small MOE models.☆233Updated 10 months ago
- Exploring Applications of GRPO☆251Updated 5 months ago
- GPU-optimized framework for training diffusion language models at any scale. The backend of Quokka, Super Data Learners, and OpenMoE 2 tr…☆321Updated 2 months ago
- A scalable asynchronous reinforcement learning implementation with in-flight weight updates.☆361Updated this week
- ☆957Updated 3 months ago
- PyTorch-native post-training at scale☆605Updated last week
- minimal GRPO implementation from scratch☆102Updated 10 months ago
- Tina: Tiny Reasoning Models via LoRA☆316Updated 4 months ago
- Open-source release accompanying Gao et al. 2025☆501Updated last month
- ☆104Updated 6 months ago
- Miles is an enterprise-facing reinforcement learning framework for LLM and VLM post-training, forked from and co-evolving with slime.☆830Updated this week
- Dion optimizer algorithm☆424Updated 3 weeks ago
- Chain of Experts (CoE) enables communication between experts within Mixture-of-Experts (MoE) models☆227Updated 3 months ago
- Official JAX implementation of End-to-End Test-Time Training for Long Context☆511Updated last week
- [ICLR 2026] Official PyTorch Implementation of RLP: Reinforcement as a Pretraining Objective☆231Updated last week
- Multi-Turn RL Training System with AgentTrainer for Language Model Game Reinforcement Learning☆59Updated last month
- Minimal hackable GRPO implementation☆321Updated last year
- ☆394Updated last week
- A Gym for Agentic LLMs☆439Updated 2 weeks ago
- 🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.☆623Updated last week
- nanoGRPO is a lightweight implementation of Group Relative Policy Optimization (GRPO)☆143Updated 8 months ago
- ☆412Updated last year
- Open-source framework for the research and development of foundation models.☆742Updated this week
- Training teachers with reinforcement learning able to make LLMs learn how to reason for test time scaling.☆358Updated 7 months ago
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…☆371Updated last year