MoonshotAI / Moonlight
Muon is Scalable for LLM Training
β1,043Updated last month
Alternatives and similar repositories for Moonlight:
Users that are interested in Moonlight are comparing it to the libraries listed below
- MoBA: Mixture of Block Attention for Long-Context LLMsβ1,771Updated last month
- π³ Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"β653Updated last month
- Scalable RL solution for advanced reasoning of language modelsβ1,537Updated last month
- Official Repo for Open-Reasoner-Zeroβ1,912Updated last month
- Recipes to scale inference-time compute of open modelsβ1,068Updated this week
- Dream 7B, a large diffusion language modelβ622Updated last week
- β739Updated 3 weeks ago
- Understanding R1-Zero-Like Training: A Critical Perspectiveβ915Updated 3 weeks ago
- Large Reasoning Modelsβ804Updated 5 months ago
- Muon optimizer: +>30% sample efficiency with <3% wallclock overheadβ611Updated last month
- Kimi-VL: Mixture-of-Experts Vision-Language Model for Multimodal Reasoning, Long-Context Understanding, and Strong Agent Capabilitiesβ829Updated 3 weeks ago
- An Open Large Reasoning Model for Real-World Solutionsβ1,488Updated 2 months ago
- An Open-source RL System from ByteDance Seed and Tsinghua AIRβ1,219Updated last month
- β683Updated last week
- Minimalistic large language model 3D-parallelism trainingβ1,850Updated this week
- Training Large Language Model to Reason in a Continuous Latent Spaceβ1,104Updated 3 months ago
- O1 Replication Journeyβ1,987Updated 3 months ago
- RAGEN leverages reinforcement learning to train LLM reasoning agents in interactive, stochastic environments.β1,772Updated this week
- OLMoE: Open Mixture-of-Experts Language Modelsβ739Updated last month
- LIMO: Less is More for Reasoningβ933Updated last month
- Implementation of the sparse attention pattern proposed by the Deepseek team in their "Native Sparse Attention" paperβ613Updated last month
- OpenR: An Open Source Framework for Advanced Reasoning with Large Language Modelsβ1,765Updated 3 months ago
- β928Updated 3 months ago
- [NeurIPS'24 Spotlight, ICLR'25, ICML'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attentionβ¦β1,013Updated last week
- EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRLβ2,315Updated this week
- β684Updated 3 weeks ago
- Fast, Flexible and Portable Structured Generationβ922Updated this week
- Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.β723Updated 7 months ago
- Simple RL training for reasoningβ3,540Updated last month
- A fork to add multimodal model training to open-r1β1,252Updated 3 months ago