Muon is Scalable for LLM Training
☆1,487Aug 3, 2025Updated 10 months ago
Alternatives and similar repositories for Moonlight
Users that are interested in Moonlight are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- MoBA: Mixture of Block Attention for Long-Context LLMs☆2,123Apr 3, 2025Updated last year
- Muon is an optimizer for hidden layers in neural networks☆2,642May 24, 2026Updated 2 weeks ago
- 🐳 Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"☆1,004Feb 5, 2026Updated 4 months ago
- A bidirectional pipeline parallelism algorithm for computation-communication overlap in DeepSeek V3/R1 training.☆2,962Jan 14, 2026Updated 4 months ago
- 🚀 Efficient implementations for emerging model architectures☆5,182Updated this week
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- FlashMLA: Efficient Multi-head Latent Attention Kernels☆12,690Apr 30, 2026Updated last month
- Flash-Muon: An Efficient Implementation of Muon Optimizer☆251Jun 15, 2025Updated 11 months ago
- ☆812Jun 9, 2025Updated last year
- Official Repo for Open-Reasoner-Zero