amd / ZenDNNLinks
☆106Updated last month
Alternatives and similar repositories for ZenDNN
Users that are interested in ZenDNN are comparing it to the libraries listed below
Sorting:
- AMD's graph optimization engine.☆220Updated this week
- Stretching GPU performance for GEMMs and tensor contractions.☆241Updated this week
- hipBLASLt is a library that provides general matrix-matrix operations with a flexible API and extends functionalities beyond a traditiona…☆97Updated this week
- oneCCL Bindings for Pytorch*☆97Updated last month
- AI Tensor Engine for ROCm☆201Updated this week
- OpenAI Triton backend for Intel® GPUs☆187Updated this week
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆329Updated this week
- oneAPI Collective Communications Library (oneCCL)☆234Updated last week
- MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.☆130Updated last year
- ROCm BLAS marshalling library☆143Updated this week
- Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators☆400Updated this week
- ROC profiler library. Profiling with perf-counters and derived metrics.☆147Updated last week
- rocWMMA☆114Updated this week
- ☆61Updated 5 months ago
- AMD SMI☆68Updated this week
- Bandwidth test for ROCm☆55Updated last week
- ☆256Updated this week
- Development repository for the Triton language and compiler☆122Updated this week
- Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…☆61Updated 2 months ago
- Ahead of Time (AOT) Triton Math Library☆64Updated this week
- ☆146Updated this week
- ☆35Updated last week
- CUDA Kernel Benchmarking Library☆650Updated this week
- Assembler for NVIDIA Volta and Turing GPUs☆218Updated 3 years ago
- RCCL Performance Benchmark Tests☆67Updated last week
- ROCm Device Libraries☆97Updated last year
- Training material for Nsight developer tools☆157Updated 9 months ago
- Computation using data flow graphs for scalable machine learning☆67Updated this week
- A collection of examples for the ROCm software stack☆215Updated this week
- ROCm Tracer Callback/Activity Library for Performance tracing AMD GPUs☆83Updated this week