NVIDIA / Megatron-EnergonLinks

Megatron's multi-modal data loader

☆280

Alternatives and similar repositories for Megatron-Energon

Users that are interested in Megatron-Energon are comparing it to the libraries listed below

Sorting:

NVIDIA-NeMo / Megatron-Bridge
HuggingFace conversion and training library for Megatron-based models
☆228Updated this week
foundation-model-stack / fms-fsdp
🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…
☆271Updated last week
NVlabs / COAT
[ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Training
☆250Updated 3 months ago
fla-org / flame
🔥 A minimal training framework for scaling FLA models
☆314Updated 2 weeks ago
RulinShao / LightSeq
Official repository for DistFlashAttn: Distributed Memory-efficient Attention for Long-context LLMs Training
☆218Updated last year
yanring / Megatron-MoE-ModelZoo
Best practices for training DeepSeek, Mixtral, Qwen and other MoE models using Megatron Core.
☆130Updated 3 weeks ago
lucidrains / ring-attention-pytorch
Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch
☆548Updated 6 months ago
alexzhang13 / flashattention2-custom-mask
Triton implementation of FlashAttention2 that adds Custom Masks.
☆151Updated last year
NVIDIA / Star-Attention
Efficient LLM Inference over Long Sequences
☆392Updated 5 months ago
ByteDance-Seed / ByteCheckpoint
ByteCheckpoint: An Unified Checkpointing Library for LFMs
☆254Updated this week
Dao-AILab / grouped-latent-attention
☆132Updated 6 months ago
mit-han-lab / duo-attention
[ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
☆507Updated 9 months ago
nil0x9 / flash-muon
Flash-Muon: An Efficient Implementation of Muon Optimizer
☆212Updated 5 months ago
meta-pytorch / float8_experimental
This repository contains the experimental PyTorch native float8 training UX
☆226Updated last year
Azure / MS-AMP
Microsoft Automatic Mixed Precision Library
☆628Updated this week
haoliuhl / ringattention
Large Context Attention
☆753Updated last month
gpu-mode / ring-attention
ring-attention experiments
☆160Updated last year
OpenNLPLab / lightning-attention
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models
☆335Updated 9 months ago
foundation-model-stack / foundation-model-stack
🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.
☆217Updated last week
stepfun-ai / Step3
☆439Updated 3 months ago
XunhaoLai / native-sparse-attention-triton
Efficient triton implementation of Native Sparse Attention.
☆251Updated 6 months ago
zhuzilin / ring-flash-attention
Ring attention implementation with flash attention
☆923Updated 2 months ago
yaof20 / Flash-RL
Implementation for FP8/INT8 Rollout for RL training without performence drop.
☆275Updated 3 weeks ago
shawntan / scattermoe
Triton-based implementation of Sparse Mixture of Experts.
☆253Updated 2 months ago
thu-ml / low-bit-optimizers
Low-bit optimizers for PyTorch
☆133Updated 2 years ago
fanshiqing / grouped_gemm
PyTorch bindings for CUTLASS grouped GEMM.
☆170Updated last month
NVIDIA-NeMo / Automodel
Pytorch Distributed native training library for LLMs/VLMs with OOTB Hugging Face support
☆194Updated this week
huggingface / kernels
Load compute kernels from the Hub
☆337Updated last week
apple / ml-cross-entropy
☆555Updated 2 months ago
JT-Ushio / MHA2MLA
Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs
☆196Updated 2 months ago