amd / ZenDNN
☆105Updated last week
Alternatives and similar repositories for ZenDNN:
Users that are interested in ZenDNN are comparing it to the libraries listed below
- oneCCL Bindings for Pytorch*☆93Updated this week
- Stretching GPU performance for GEMMs and tensor contractions.☆235Updated this week
- AMD's graph optimization engine.☆214Updated this week
- oneAPI Collective Communications Library (oneCCL)☆232Updated last week
- MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.☆130Updated last year
- AI Tensor Engine for ROCm☆160Updated this week
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆314Updated this week
- rocWMMA☆106Updated this week
- Development repository for the Triton language and compiler☆115Updated last week
- hipBLASLt is a library that provides general matrix-matrix operations with a flexible API and extends functionalities beyond a traditiona…☆86Updated this week
- Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators☆376Updated this week
- OpenAI Triton backend for Intel® GPUs☆178Updated this week
- Ahead of Time (AOT) Triton Math Library☆57Updated this week
- Bandwidth test for ROCm☆54Updated this week
- ☆27Updated this week
- ROCm Communication Collectives Library (RCCL)☆317Updated this week
- RCCL Performance Benchmark Tests☆63Updated this week
- ROCm BLAS marshalling library☆136Updated this week
- ☆141Updated this week
- Benchmarks to capture important workloads.☆31Updated 2 months ago
- ☆61Updated 3 months ago
- Intel® Extension for MLIR. A staging ground for MLIR dialects and tools for Intel devices using the MLIR toolchain.☆134Updated last week
- ☆250Updated this week
- ☆50Updated last year
- Unified compiler/runtime for interfacing with PyTorch Dynamo.☆99Updated last month
- Antares: an automatic engine for multi-platform kernel generation and optimization. Supporting CPU, CUDA, ROCm, DirectX12, GraphCore, SYC…☆468Updated last week
- GPUOcelot: A dynamic compilation framework for PTX☆185Updated 2 months ago
- ☆38Updated this week
- Assembler for NVIDIA Volta and Turing GPUs☆216Updated 3 years ago
- AMD SMI☆58Updated this week