Azure / MS-AMP-Examples
Examples for MS-AMP package.
☆25Updated 7 months ago
Related projects ⓘ
Alternatives and complementary repositories for MS-AMP-Examples
- Zero Bubble Pipeline Parallelism☆279Updated last week
- Official repository for LightSeq: Sequence Level Parallelism for Distributed Training of Long Context Transformers☆196Updated 2 months ago
- ☆88Updated 2 months ago
- ☆65Updated 3 years ago
- PyTorch bindings for CUTLASS grouped GEMM.☆67Updated 3 months ago
- ☆26Updated 3 years ago
- A collection of memory efficient attention operators implemented in the Triton language.☆217Updated 5 months ago
- Microsoft Automatic Mixed Precision Library☆523Updated last month
- ☆109Updated 7 months ago
- ☆156Updated last year
- nnScaler: Compiling DNN models for Parallel Training☆64Updated 2 weeks ago
- This repository contains integer operators on GPUs for PyTorch.☆181Updated last year
- ☆74Updated 3 weeks ago
- PyTorch bindings for CUTLASS grouped GEMM.☆53Updated last week
- pytorch-profiler☆49Updated last year
- ☆162Updated 4 months ago
- (NeurIPS 2022) Automatically finding good model-parallel strategies, especially for complex models and clusters.☆34Updated 2 years ago
- Inference code for LLaMA models☆19Updated 5 months ago
- A Python library transfers PyTorch tensors between CPU and NVMe☆96Updated this week
- Research and development for optimizing transformers☆124Updated 3 years ago
- ☆70Updated 2 years ago
- Dynamic Memory Management for Serving LLMs without PagedAttention☆228Updated last week
- An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).☆196Updated 2 weeks ago
- FTPipe and related pipeline model parallelism research.☆41Updated last year
- ☆140Updated last year
- Automated Parallelization System and Infrastructure for Multiple Ecosystems☆75Updated last week
- Applied AI experiments and examples for PyTorch☆160Updated last week
- USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference☆353Updated last week
- A schedule language for large model training☆141Updated 4 months ago
- A fast communication-overlapping library for tensor parallelism on GPUs.☆219Updated 2 weeks ago