yanring / Megatron-MoE-ModelZooLinks

Best practices for training DeepSeek, Mixtral, Qwen and other MoE models using Megatron Core.

☆114

Alternatives and similar repositories for Megatron-MoE-ModelZoo

Users that are interested in Megatron-MoE-ModelZoo are comparing it to the libraries listed below

Sorting:

RulinShao / LightSeq
Official repository for DistFlashAttn: Distributed Memory-efficient Attention for Long-context LLMs Training
☆216Updated last year
fanshiqing / grouped_gemm
PyTorch bindings for CUTLASS grouped GEMM.
☆156Updated 2 weeks ago
tgale96 / grouped_gemm
PyTorch bindings for CUTLASS grouped GEMM.
☆125Updated 4 months ago
fzyzcjy / torch_memory_saver
Allow torch tensor memory to be released and resumed later
☆157Updated this week
shawntan / scattermoe
Triton-based implementation of Sparse Mixture of Experts.
☆246Updated 3 weeks ago
meta-pytorch / applied-ai
Applied AI experiments and examples for PyTorch
☆299Updated 2 months ago
sail-sg / zero-bubble-pipeline-parallelism
Zero Bubble Pipeline Parallelism
☆433Updated 5 months ago
fzyzcjy / torch_utils
Utility scripts for PyTorch (e.g. Memory profiler that understands more low-level allocations such as NCCL)
☆62Updated last month
ISEEKYAN / mbridge
Bridge Megatron-Core to Hugging Face/Reinforcement Learning
☆142Updated this week
mit-han-lab / Quest
[ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
☆338Updated 3 months ago
foundation-model-stack / foundation-model-stack
🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.
☆215Updated this week
NVlabs / COAT
[ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Training
☆242Updated 2 months ago
FlagOpen / FlagAttention
A collection of memory efficient attention operators implemented in the Triton language.
☆282Updated last year
ByteDance-Seed / ByteCheckpoint
ByteCheckpoint: An Unified Checkpointing Library for LFMs
☆249Updated 3 months ago
sgl-project / SpecForge
Train speculative decoding models effortlessly and port them smoothly to SGLang serving.
☆439Updated this week
stanford-futuredata / stk
☆112Updated last year
Victarry / PP-Schedule-Visualization
Pipeline Parallelism Emulation and Visualization
☆68Updated 4 months ago
opengear-project / GEAR
GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM
☆169Updated last year
madsys-dev / deepseekv2-profile
☆148Updated 7 months ago
meta-pytorch / tritonbench
Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.
☆264Updated this week
IST-DASLab / Sparse-Marlin
Boosting 4-bit inference kernels with 2:4 Sparsity
☆84Updated last year
yaof20 / Flash-RL
Implementation for FP8/INT8 Rollout for RL training without performence drop.
☆261Updated 3 weeks ago
usyd-fsalab / fp6_llm
An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).
☆266Updated 3 months ago
yifuwang / symm-mem-recipes
☆141Updated 10 months ago
InternLM / Awesome-LLM-Training-System
☆43Updated last year
thunlp / TritonBench
TritonBench: Benchmarking Large Language Model Capabilities for Generating Triton Operators
☆87Updated 4 months ago
Dao-AILab / grouped-latent-attention
☆130Updated 4 months ago
ByteDance-Seed / FlexPrefill
Code for paper: [ICLR2025 Oral] FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference
☆147Updated 2 weeks ago
efeslab / Atom
[MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
☆320Updated last year
Dao-AILab / quack
A Quirky Assortment of CuTe Kernels
☆637Updated 2 weeks ago