NVIDIA / Megatron-Energon
Megatron's multi-modal data loader
☆42Updated this week
Related projects: ⓘ
- ☆83Updated 3 weeks ago
- Patch convolution to avoid large GPU memory usage of Conv2D☆73Updated 3 months ago
- Applied AI experiments and examples for PyTorch☆123Updated last month
- This repository contains the experimental PyTorch native float8 training UX☆210Updated last month
- Odysseus: Playground of LLM Sequence Parallelism☆50Updated 3 months ago
- 🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…☆150Updated last week
- ☆151Updated last year
- ☆66Updated 3 months ago
- Boosting 4-bit inference kernels with 2:4 Sparsity☆47Updated 2 weeks ago
- 🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.☆144Updated this week
- Official repository for LightSeq: Sequence Level Parallelism for Distributed Training of Long Context Transformers☆183Updated last month
- ☆50Updated 3 months ago
- Torch Distributed Experimental☆115Updated last month
- Cataloging released Triton kernels.☆111Updated 3 weeks ago
- ring-attention experiments☆89Updated 5 months ago
- CUDA and Triton implementations of Flash Attention with SoftmaxN.☆66Updated 3 months ago
- Triton-based implementation of Sparse Mixture of Experts.☆166Updated last month
- ☆75Updated this week
- ☆130Updated last year
- ☆68Updated 2 months ago
- Low-bit optimizers for PyTorch☆109Updated 11 months ago
- Fairring (FAIR + Herring) is a plug-in for PyTorch that provides a process group for distributed training that outperforms NCCL at large …☆61Updated 2 years ago
- ☆38Updated 3 years ago
- A library for unit scaling in PyTorch☆94Updated 2 weeks ago
- ☆164Updated 4 months ago
- A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind…☆145Updated this week
- Repository of the paper "Accelerating Transformer Inference for Translation via Parallel Decoding"☆99Updated 6 months ago
- ☆61Updated 3 weeks ago
- VIT inference in triton because, why not?☆16Updated 3 months ago
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆36Updated 8 months ago