amd / ZenDNN
☆105Updated last week
Alternatives and similar repositories for ZenDNN:
Users that are interested in ZenDNN are comparing it to the libraries listed below
- Stretching GPU performance for GEMMs and tensor contractions.☆232Updated this week
- hipBLASLt is a library that provides general matrix-matrix operations with a flexible API and extends functionalities beyond a traditiona…☆79Updated this week
- AMD's graph optimization engine.☆213Updated this week
- oneCCL Bindings for Pytorch*☆89Updated last week
- OpenAI Triton backend for Intel® GPUs☆168Updated this week
- MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.☆128Updated last year
- rocWMMA☆101Updated this week
- RCCL Performance Benchmark Tests☆59Updated this week
- ROCm BLAS marshalling library☆133Updated this week
- A collection of examples for the ROCm software stack☆190Updated this week
- ☆137Updated this week
- ROCm Tracer Callback/Activity Library for Performance tracing AMD GPUs☆79Updated this week
- oneAPI Collective Communications Library (oneCCL)☆224Updated last week
- ROC profiler library. Profiling with perf-counters and derived metrics.☆135Updated this week
- Bandwidth test for ROCm☆54Updated 3 weeks ago
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆313Updated this week
- Assembler for NVIDIA Volta and Turing GPUs☆214Updated 3 years ago
- ☆249Updated this week
- ROCm Communication Collectives Library (RCCL)☆304Updated this week
- ☆22Updated this week
- Development repository for the Triton language and compiler☆109Updated this week
- ☆60Updated 2 months ago
- Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators☆362Updated this week
- An extension library of WMMA API (Tensor Core API)☆91Updated 8 months ago
- MLIR-based partitioning system☆71Updated this week
- ☆408Updated this week
- A tool for bandwidth measurements on NVIDIA GPUs.☆385Updated last month
- GPUOcelot: A dynamic compilation framework for PTX☆178Updated last month
- Unified compiler/runtime for interfacing with PyTorch Dynamo.☆99Updated 2 weeks ago