MoonshotAI / MoonlightLinks
Muon is Scalable for LLM Training
β1,302Updated last month
Alternatives and similar repositories for Moonlight
Users that are interested in Moonlight are comparing it to the libraries listed below
Sorting:
- π³ Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"β847Updated 5 months ago
- MoBA: Mixture of Block Attention for Long-Context LLMsβ1,899Updated 5 months ago
- slime is a LLM post-training framework for RL Scaling.β1,652Updated this week
- β812Updated 3 months ago
- Dream 7B, a large diffusion language modelβ959Updated 3 weeks ago
- OLMoE: Open Mixture-of-Experts Language Modelsβ857Updated 5 months ago
- Understanding R1-Zero-Like Training: A Critical Perspectiveβ1,076Updated 2 weeks ago
- VeOmni: Scaling any Modality Model Training to any Accelerators with PyTorch native Training Frameworkβ1,050Updated 2 weeks ago
- Kimi-VL: Mixture-of-Experts Vision-Language Model for Multimodal Reasoning, Long-Context Understanding, and Strong Agent Capabilitiesβ1,053Updated last month
- Scalable RL solution for advanced reasoning of language modelsβ1,716Updated 5 months ago
- Muon is an optimizer for hidden layers in neural networksβ1,672Updated 2 months ago
- Official Repo for Open-Reasoner-Zeroβ2,033Updated 3 months ago
- Parallel Scaling Law for Language Model β Beyond Parameter and Inference Time Scalingβ438Updated 3 months ago
- [NeurIPS'24 Spotlight, ICLR'25, ICML'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attentionβ¦β1,124Updated last month
- Ring attention implementation with flash attentionβ864Updated last month
- Implementation of the sparse attention pattern proposed by the Deepseek team in their "Native Sparse Attention" paperβ740Updated 3 weeks ago
- An Open-source RL System from ByteDance Seed and Tsinghua AIRβ1,537Updated 4 months ago
- Scalable toolkit for efficient model reinforcementβ843Updated this week
- Large Reasoning Modelsβ805Updated 9 months ago
- β1,122Updated last week
- MMaDA - Open-Sourced Multimodal Large Diffusion Language Modelsβ1,341Updated 3 weeks ago
- Unleashing the Power of Reinforcement Learning for Math and Code Reasonersβ708Updated 3 months ago
- Fast, Flexible and Portable Structured Generationβ1,215Updated this week
- [COLM 2025] LIMO: Less is More for Reasoningβ1,015Updated last month
- An Open Large Reasoning Model for Real-World Solutionsβ1,515Updated 3 months ago
- Recipes to scale inference-time compute of open modelsβ1,111Updated 3 months ago
- TransMLA: Multi-Head Latent Attention Is All You Needβ353Updated last week
- Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Modelsβ807Updated 2 months ago
- SkyRL: A Modular Full-stack RL Library for LLMsβ818Updated this week
- Pretraining and inference code for a large-scale depth-recurrent language modelβ826Updated last week