amd / ZenDNN
☆106Updated 3 weeks ago
Alternatives and similar repositories for ZenDNN:
Users that are interested in ZenDNN are comparing it to the libraries listed below
- AMD's graph optimization engine.☆216Updated this week
- AI Tensor Engine for ROCm☆187Updated this week
- Stretching GPU performance for GEMMs and tensor contractions.☆237Updated this week
- hipBLASLt is a library that provides general matrix-matrix operations with a flexible API and extends functionalities beyond a traditiona…☆93Updated this week
- Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators☆393Updated this week
- Bandwidth test for ROCm☆54Updated 3 weeks ago
- OpenAI Triton backend for Intel® GPUs☆183Updated this week
- oneCCL Bindings for Pytorch*☆95Updated last week
- Development repository for the Triton language and compiler☆118Updated this week
- oneAPI Collective Communications Library (oneCCL)☆232Updated last week
- ROCm BLAS marshalling library☆140Updated this week
- MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.☆130Updated last year
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆324Updated this week
- ROCm Communication Collectives Library (RCCL)☆330Updated this week
- ☆60Updated 4 months ago
- rocWMMA☆110Updated this week
- ☆142Updated this week
- A tool for bandwidth measurements on NVIDIA GPUs.☆413Updated 3 weeks ago
- Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…☆62Updated 2 months ago
- ☆60Updated last year
- Ahead of Time (AOT) Triton Math Library☆62Updated last week
- ☆30Updated this week
- Assembler for NVIDIA Volta and Turing GPUs☆218Updated 3 years ago
- Shared Middle-Layer for Triton Compilation☆246Updated 2 weeks ago
- Computation using data flow graphs for scalable machine learning☆67Updated this week
- ROC profiler library. Profiling with perf-counters and derived metrics.☆144Updated last week
- ☆413Updated this week
- ROCm Device Libraries☆97Updated last year
- A library to analyze PyTorch traces.☆367Updated last week
- MLIR-based partitioning system☆82Updated this week