amd / ZenDNNLinks
☆107Updated last week
Alternatives and similar repositories for ZenDNN
Users that are interested in ZenDNN are comparing it to the libraries listed below
Sorting:
- AMD's graph optimization engine.☆223Updated this week
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆245Updated this week
- AI Tensor Engine for ROCm☆207Updated this week
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆337Updated this week
- OpenAI Triton backend for Intel® GPUs☆190Updated this week
- oneAPI Collective Communications Library (oneCCL)☆237Updated last week
- Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators☆423Updated this week
- Bandwidth test for ROCm☆58Updated last month
- MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.☆133Updated last year
- ROCm BLAS marshalling library☆144Updated this week
- oneCCL Bindings for Pytorch*☆97Updated last month
- rocWMMA☆115Updated this week
- hipBLASLt is a library that provides general matrix-matrix operations with a flexible API and extends functionalities beyond a traditiona…☆103Updated this week
- A tool for bandwidth measurements on NVIDIA GPUs.☆459Updated 2 months ago
- ☆62Updated 6 months ago
- Assembler for NVIDIA Volta and Turing GPUs☆221Updated 3 years ago
- Training material for Nsight developer tools☆159Updated 10 months ago
- ☆148Updated this week
- ROCm Tracer Callback/Activity Library for Performance tracing AMD GPUs☆83Updated this week
- Ahead of Time (AOT) Triton Math Library☆66Updated this week
- Next generation BLAS implementation for ROCm platform☆381Updated this week
- Benchmarks to capture important workloads.☆31Updated 4 months ago
- Development repository for the Triton language and compiler☆125Updated this week
- Unified compiler/runtime for interfacing with PyTorch Dynamo.☆100Updated last month
- RCCL Performance Benchmark Tests☆67Updated last month
- ROC profiler library. Profiling with perf-counters and derived metrics.☆148Updated this week
- MLIR-based partitioning system☆91Updated this week
- HIPIFY: Convert CUDA to Portable C++ Code☆587Updated this week
- HIPCC: HIP compiler driver☆40Updated last year
- An extension library of WMMA API (Tensor Core API)☆99Updated 11 months ago