mingyin0312 / RLFromScratchLinks
☆466Updated 5 months ago
Alternatives and similar repositories for RLFromScratch
Users that are interested in RLFromScratch are comparing it to the libraries listed below
Sorting:
- Single File, Single GPU, From Scratch, Efficient, Full Parameter Tuning library for "RL for LLMs"☆593Updated 4 months ago
- FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.☆334Updated 3 months ago
- rl from zero pretrain, can it be done? yes.☆286Updated 4 months ago
- Physics of Language Models: Part 4.2, Canon Layers at Scale where Synthetic Pretraining Resonates in Reality☆317Updated last month
- Exploring Applications of GRPO☆251Updated 5 months ago
- [ICLR 2026] Tina: Tiny Reasoning Models via LoRA☆319Updated 4 months ago
- An extension of the nanoGPT repository for training small MOE models.☆236Updated 11 months ago
- nanoGRPO is a lightweight implementation of Group Relative Policy Optimization (GRPO)☆143Updated 9 months ago
- ☆232Updated 2 months ago
- ☆961Updated 3 months ago
- A scalable asynchronous reinforcement learning implementation with in-flight weight updates.☆362Updated this week
- A Gym for Agentic LLMs☆444Updated 3 weeks ago
- GPU-optimized framework for training diffusion language models at any scale. The backend of Quokka, Super Data Learners, and OpenMoE 2 tr…☆322Updated 3 months ago
- Minimal hackable GRPO implementation☆323Updated last year
- ☆388Updated 3 months ago
- minimal GRPO implementation from scratch☆102Updated 10 months ago
- ☆394Updated 2 weeks ago
- Official JAX implementation of End-to-End Test-Time Training for Long Context☆520Updated 2 weeks ago
- PyTorch-native post-training at scale☆613Updated this week
- Open-source framework for the research and development of foundation models.☆752Updated this week
- ☆105Updated 6 months ago
- Extending the Context of Pretrained LLMs by Dropping Their Positional Embedding☆203Updated last month
- OpenTinker is an RL-as-a-Service infrastructure for foundation models☆625Updated 2 weeks ago
- The official github repo for "Diffusion Language Models are Super Data Learners".☆221Updated 3 months ago
- Miles is an enterprise-facing reinforcement learning framework for LLM and VLM post-training, forked from and co-evolving with slime.☆858Updated this week
- [ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule☆452Updated 4 months ago
- Ideas for projects related to Tinker☆164Updated 3 months ago
- A brief and partial summary of RLHF algorithms.☆144Updated 11 months ago
- [ICLR 2026] Official PyTorch Implementation of RLP: Reinforcement as a Pretraining Objective☆232Updated 2 weeks ago
- Multi-Turn RL Training System with AgentTrainer for Language Model Game Reinforcement Learning☆59Updated last month