ROCm / MADLinks
☆21Updated last week
Alternatives and similar repositories for MAD
Users that are interested in MAD are comparing it to the libraries listed below
Sorting:
- RCCL Performance Benchmark Tests☆70Updated this week
- Ongoing research training transformer models at scale☆25Updated last week
- Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…☆61Updated 2 weeks ago
- PArametrized Recommendation and Ai Model benchmark is a repository for development of numerous uBenchmarks as well as end to end nets for…☆147Updated 2 weeks ago
- ☆40Updated last week
- AI Tensor Engine for ROCm☆232Updated this week
- rocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.☆91Updated last week
- AI Accelerator Benchmark focuses on evaluating AI Accelerators from a practical production perspective, including the ease of use and ver…☆251Updated 3 weeks ago
- Microsoft Collective Communication Library☆64Updated 7 months ago
- OpenAI Triton backend for Intel® GPUs☆193Updated this week
- oneCCL Bindings for Pytorch*☆99Updated last week
- MSCCL++: A GPU-driven communication stack for scalable AI applications☆383Updated this week
- Issues related to MLPerf™ Inference policies, including rules and suggested changes☆63Updated this week
- A CUTLASS implementation using SYCL☆30Updated last week
- Ahead of Time (AOT) Triton Math Library☆70Updated last week
- ☆48Updated this week
- ☆104Updated last year
- Bandwidth test for ROCm☆60Updated this week
- ROCm Communication Collectives Library (RCCL)☆349Updated this week
- ☆100Updated 6 months ago
- Development repository for the Triton language and compiler☆125Updated this week
- End to End steps for adding custom ops in PyTorch.☆23Updated 4 years ago
- ☆62Updated 7 months ago
- ☆25Updated 3 weeks ago
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆343Updated this week
- A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators☆107Updated last month
- Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators☆437Updated this week
- Synthesizer for optimal collective communication algorithms☆110Updated last year
- ☆216Updated last year
- A lightweight design for computation-communication overlap.☆146Updated 3 weeks ago