mingyin0312 / RLFromScratchLinks
☆465Updated 4 months ago
Alternatives and similar repositories for RLFromScratch
Users that are interested in RLFromScratch are comparing it to the libraries listed below
Sorting:
- Single File, Single GPU, From Scratch, Efficient, Full Parameter Tuning library for "RL for LLMs"☆576Updated 3 months ago
- FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.☆328Updated 2 months ago
- Tina: Tiny Reasoning Models via LoRA☆314Updated 3 months ago
- Exploring Applications of GRPO☆252Updated 4 months ago
- ☆224Updated last month
- An extension of the nanoGPT repository for training small MOE models.☆225Updated 10 months ago
- Based on Nano-vLLM, a simple replication of vLLM with self-contained paged attention and flash attention implementation☆184Updated this week
- GPU-optimized framework for training diffusion language models at any scale. The backend of Quokka, Super Data Learners, and OpenMoE 2 tr…☆312Updated 2 months ago
- rl from zero pretrain, can it be done? yes.☆286Updated 3 months ago
- Physics of Language Models, Part 4☆303Updated last week
- nanoGRPO is a lightweight implementation of Group Relative Policy Optimization (GRPO)☆140Updated 8 months ago
- ☆949Updated 2 months ago
- A scalable asynchronous reinforcement learning implementation with in-flight weight updates.☆346Updated 3 weeks ago
- Official JAX implementation of End-to-End Test-Time Training for Long Context☆297Updated 2 weeks ago
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…☆370Updated last year
- Miles is an enterprise-facing reinforcement learning framework for large-scale MoE post-training and production workloads, forked from an…☆714Updated this week
- minimal GRPO implementation from scratch☆102Updated 10 months ago
- [ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule☆421Updated 4 months ago
- Flash-Muon: An Efficient Implementation of Muon Optimizer☆225Updated 7 months ago
- Normalized Transformer (nGPT)☆195Updated last year
- 🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.☆614Updated last week
- Dion optimizer algorithm☆416Updated last week
- PyTorch-native post-training at scale☆585Updated last week
- A Gym for Agentic LLMs☆420Updated 2 weeks ago
- Chain of Experts (CoE) enables communication between experts within Mixture-of-Experts (MoE) models☆226Updated 2 months ago
- Simple & Scalable Pretraining for Neural Architecture Research☆306Updated last month
- Minimal hackable GRPO implementation☆315Updated 11 months ago
- Official Implementation for the paper "d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning"☆393Updated 3 weeks ago
- A brief and partial summary of RLHF algorithms.☆142Updated 10 months ago
- Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs☆198Updated last month