NVIDIA / Megatron-EnergonLinks
Megatron's multi-modal data loader
β304Updated last week
Alternatives and similar repositories for Megatron-Energon
Users that are interested in Megatron-Energon are comparing it to the libraries listed below
Sorting:
- Training library for Megatron-based models with bi-directional Hugging Face conversion capabilityβ347Updated this week
- π Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flashβ¦β278Updated last month
- [ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Trainingβ257Updated 5 months ago
- Pytorch Distributed native training library for LLMs/VLMs with OOTB Hugging Face supportβ245Updated this week
- π₯ A minimal training framework for scaling FLA modelsβ335Updated 2 months ago
- [ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Headsβ519Updated 11 months ago
- Best practices for training DeepSeek, Mixtral, Qwen and other MoE models using Megatron Core.β149Updated 3 weeks ago
- ByteCheckpoint: An Unified Checkpointing Library for LFMsβ258Updated last month
- Official repository for DistFlashAttn: Distributed Memory-efficient Attention for Long-context LLMs Trainingβ220Updated last year
- Triton implementation of FlashAttention2 that adds Custom Masks.β160Updated last year
- Efficient LLM Inference over Long Sequencesβ393Updated 6 months ago
- Ring attention implementation with flash attentionβ961Updated 4 months ago
- USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inferenceβ621Updated 3 weeks ago
- Accelerating MoE with IO and Tile-aware Optimizationsβ522Updated last week
- PyTorch bindings for CUTLASS grouped GEMM.β180Updated 3 weeks ago
- Implementation of π Ring Attention, from Liu et al. at Berkeley AI, in Pytorchβ549Updated 8 months ago
- Microsoft Automatic Mixed Precision Libraryβ635Updated last month
- Efficient triton implementation of Native Sparse Attention.β258Updated 7 months ago
- β443Updated 5 months ago
- π Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.β218Updated this week
- Implementation for FP8/INT8 Rollout for RL training without performence drop.β281Updated 2 months ago
- This repository contains the experimental PyTorch native float8 training UXβ227Updated last year
- PyTorch bindings for CUTLASS grouped GEMM.β140Updated 7 months ago
- β133Updated 7 months ago
- Large Context Attentionβ759Updated 3 months ago
- An industrial extension library of pytorch to accelerate large scale model trainingβ57Updated 5 months ago
- Flash-Muon: An Efficient Implementation of Muon Optimizerβ225Updated 7 months ago
- Triton-based implementation of Sparse Mixture of Experts.β259Updated 3 months ago
- A Distributed Attention Towards Linear Scalability for Ultra-Long Context, Heterogeneous Data Trainingβ607Updated this week
- Zero Bubble Pipeline Parallelismβ445Updated 8 months ago