IBM / pytorch-communication-benchmarksLinks
pytorch code examples for measuring the performance of collective communication calls in AI workloads
β18Updated 7 months ago
Alternatives and similar repositories for pytorch-communication-benchmarks
Users that are interested in pytorch-communication-benchmarks are comparing it to the libraries listed below
Sorting:
- Memory Optimizations for Deep Learning (ICML 2023)β64Updated last year
- A bunch of kernels that might make stuff slower πβ46Updated this week
- extensible collectives library in tritonβ87Updated 2 months ago
- A simple calculation for LLM MFU.β38Updated 3 months ago
- A Python library transfers PyTorch tensors between CPU and NVMeβ116Updated 6 months ago
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.β127Updated this week
- MLPerfβ’ logging libraryβ36Updated last month
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.β88Updated this week
- High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.β109Updated 10 months ago
- A minimal implementation of vllm.β41Updated 10 months ago
- A lightweight design for computation-communication overlap.β132Updated 3 weeks ago
- Framework to reduce autotune overhead to zero for well known deployments.β74Updated 2 weeks ago
- Benchmarks to capture important workloads.β31Updated 4 months ago
- Fairring (FAIR + Herring) is a plug-in for PyTorch that provides a process group for distributed training that outperforms NCCL at large β¦β65Updated 3 years ago
- β71Updated 2 months ago
- β50Updated last year
- β96Updated 8 months ago
- Boosting 4-bit inference kernels with 2:4 Sparsityβ75Updated 9 months ago
- Automated Parallelization System and Infrastructure for Multiple Ecosystemsβ79Updated 6 months ago
- Odysseus: Playground of LLM Sequence Parallelismβ69Updated 11 months ago
- β208Updated 10 months ago
- Standalone Flash Attention v2 kernel without libtorch dependencyβ109Updated 8 months ago
- Intel Gaudi's Megatron DeepSpeed Large Language Models for trainingβ13Updated 5 months ago
- Applied AI experiments and examples for PyTorchβ271Updated this week
- LLM-Inference-Benchβ43Updated 4 months ago
- An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).β251Updated 7 months ago
- β86Updated 5 months ago
- β105Updated 9 months ago
- β85Updated 2 months ago
- Ahead of Time (AOT) Triton Math Libraryβ64Updated last week