Azure / MS-AMP-ExamplesLinks
Examples for MS-AMP package.
☆29Updated last month
Alternatives and similar repositories for MS-AMP-Examples
Users that are interested in MS-AMP-Examples are comparing it to the libraries listed below
Sorting:
- Official repository for DistFlashAttn: Distributed Memory-efficient Attention for Long-context LLMs Training☆214Updated last year
- Best practices for training DeepSeek, Mixtral, Qwen and other MoE models using Megatron Core.☆64Updated last week
- PyTorch bindings for CUTLASS grouped GEMM.☆110Updated 3 months ago
- Odysseus: Playground of LLM Sequence Parallelism☆76Updated last year
- [ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Training☆234Updated 3 weeks ago
- PyTorch bindings for CUTLASS grouped GEMM.☆138Updated this week
- ☆110Updated last year
- ☆85Updated 3 years ago
- A collection of memory efficient attention operators implemented in the Triton language.☆278Updated last year
- Summary of system papers/frameworks/codes/tools on training or serving large model☆57Updated last year
- 16-fold memory access reduction with nearly no loss☆104Updated 5 months ago
- Triton-based implementation of Sparse Mixture of Experts.☆236Updated last week
- ☆121Updated last year
- ☆159Updated last year
- Boosting 4-bit inference kernels with 2:4 Sparsity☆81Updated 11 months ago
- Distributed IO-aware Attention algorithm☆21Updated last year
- Pipeline Parallelism Emulation and Visualization☆63Updated 2 months ago
- ☆75Updated 4 years ago
- Utility scripts for PyTorch (e.g. Memory profiler that understands more low-level allocations such as NCCL)☆50Updated 3 weeks ago
- QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference☆119Updated last year
- ☆42Updated 2 years ago
- 🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.☆209Updated last week
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.☆214Updated this week
- Zero Bubble Pipeline Parallelism☆423Updated 3 months ago
- ☆123Updated 3 months ago
- Training library for Megatron-based models☆54Updated this week
- This repository contains integer operators on GPUs for PyTorch.☆213Updated last year
- Triton implementation of FlashAttention2 that adds Custom Masks.☆133Updated last year
- An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).☆261Updated last month
- GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM☆165Updated last year