ROCm / MADLinks
☆19Updated last week
Alternatives and similar repositories for MAD
Users that are interested in MAD are comparing it to the libraries listed below
Sorting:
- Ongoing research training transformer models at scale☆23Updated 2 weeks ago
- RCCL Performance Benchmark Tests☆68Updated last month
- ☆38Updated this week
- oneCCL Bindings for Pytorch*☆97Updated 2 months ago
- rocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.☆90Updated this week
- A CUTLASS implementation using SYCL☆27Updated this week
- Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…☆61Updated last week
- AI Tensor Engine for ROCm☆208Updated this week
- ☆20Updated 3 months ago
- ☆90Updated 6 months ago
- ☆62Updated 6 months ago
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆106Updated this week
- Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators☆427Updated this week
- TransferBench is a utility capable of benchmarking simultaneous copies between user-specified devices (CPUs/GPUs)☆41Updated last week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆84Updated this week
- ROCm Communication Collectives Library (RCCL)☆342Updated this week
- ☆48Updated this week
- PArametrized Recommendation and Ai Model benchmark is a repository for development of numerous uBenchmarks as well as end to end nets for…☆145Updated this week
- ☆117Updated last month
- oneAPI Collective Communications Library (oneCCL)☆237Updated 2 weeks ago
- Training material for Nsight developer tools☆159Updated 10 months ago
- Development repository for the Triton language and compiler☆125Updated this week
- OpenAI Triton backend for Intel® GPUs☆191Updated this week
- Advanced Profiling and Analytics for AMD Hardware☆157Updated this week
- ☆25Updated last week
- Multi-GPU communication profiler and visualizer☆30Updated last year
- End to End steps for adding custom ops in PyTorch.☆23Updated 4 years ago
- ☆98Updated last year
- MSCCL++: A GPU-driven communication stack for scalable AI applications☆379Updated this week
- ☆60Updated last year