Azure / MS-AMP-ExamplesLinks
Examples for MS-AMP package.
☆29Updated 3 weeks ago
Alternatives and similar repositories for MS-AMP-Examples
Users that are interested in MS-AMP-Examples are comparing it to the libraries listed below
Sorting:
- Official repository for DistFlashAttn: Distributed Memory-efficient Attention for Long-context LLMs Training☆212Updated 11 months ago
- PyTorch bindings for CUTLASS grouped GEMM.☆107Updated 2 months ago
- Best practices for testing advanced Mixtral, DeepSeek, and Qwen series MoE models using Megatron Core MoE.☆45Updated last week
- Odysseus: Playground of LLM Sequence Parallelism☆72Updated last year
- PyTorch bindings for CUTLASS grouped GEMM.☆134Updated 3 weeks ago
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.☆199Updated this week
- [ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Training☆221Updated last month
- ☆107Updated 11 months ago
- Zero Bubble Pipeline Parallelism☆415Updated 3 months ago
- Triton implementation of FlashAttention2 that adds Custom Masks.☆128Updated 11 months ago
- Summary of system papers/frameworks/codes/tools on training or serving large model☆57Updated last year
- Boosting 4-bit inference kernels with 2:4 Sparsity☆80Updated 11 months ago
- ☆123Updated 2 months ago
- A collection of memory efficient attention operators implemented in the Triton language.☆275Updated last year
- ☆75Updated 4 years ago
- QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference☆118Updated last year
- A Suite for Parallel Inference of Diffusion Transformers (DiTs) on multi-GPU Clusters☆46Updated last year
- ☆85Updated 3 years ago
- 16-fold memory access reduction with nearly no loss☆103Updated 4 months ago
- Triton-based implementation of Sparse Mixture of Experts.☆230Updated 8 months ago
- ☆42Updated 2 years ago
- This repository contains integer operators on GPUs for PyTorch.☆211Updated last year
- A Python library transfers PyTorch tensors between CPU and NVMe☆118Updated 8 months ago
- ☆120Updated last year
- Megatron's multi-modal data loader☆232Updated last week
- Pipeline Parallelism Emulation and Visualization☆54Updated last month
- ☆158Updated last year
- Applied AI experiments and examples for PyTorch☆289Updated 2 months ago
- A Quirky Assortment of CuTe Kernels☆388Updated this week
- ☆137Updated 5 months ago