ROCm / amd_matrix_instruction_calculatorLinks
A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators
☆107Updated last month
Alternatives and similar repositories for amd_matrix_instruction_calculator
Users that are interested in amd_matrix_instruction_calculator are comparing it to the libraries listed below
Sorting:
- rocWMMA☆118Updated this week
- amdgpu example code in hip/asm☆35Updated last month
- ☆148Updated this week
- ☆62Updated 6 months ago
- An extension library of WMMA API (Tensor Core API)☆99Updated last year
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆246Updated this week
- Assembler for NVIDIA Volta and Turing GPUs☆224Updated 3 years ago
- Advanced Profiling and Analytics for AMD Hardware☆159Updated this week
- Dissecting NVIDIA GPU Architecture☆99Updated 3 years ago
- ☆102Updated last year
- OpenAI Triton backend for Intel® GPUs☆191Updated this week
- Intel® Extension for MLIR. A staging ground for MLIR dialects and tools for Intel devices using the MLIR toolchain.☆137Updated this week
- ☆25Updated 3 weeks ago
- collection of benchmarks to measure basic GPU capabilities☆391Updated 5 months ago
- Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators☆437Updated this week
- ☆44Updated 4 years ago
- AMD lab notes with code examples to demonstrate use of AMD GPUs☆98Updated last year
- IREE's PyTorch Frontend, based on Torch Dynamo.☆90Updated this week
- rocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.☆91Updated this week
- ☆247Updated last month
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆172Updated this week
- A CUTLASS implementation using SYCL☆30Updated this week
- MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.☆133Updated last year
- Profiling Tools Interfaces for GPU (PTI for GPU) is a set of Getting Started Documentation and Tools Library to start performance analysi…☆231Updated 2 weeks ago
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆106Updated 7 years ago
- GPUOcelot: A dynamic compilation framework for PTX☆201Updated 5 months ago
- Test suite for probing the numerical behavior of NVIDIA tensor cores☆40Updated 11 months ago
- TPP experimentation on MLIR for linear algebra☆132Updated this week
- LLVM/MLIR based compiler instrumentation of AMD GPU kernels☆18Updated 2 months ago
- Unified compiler/runtime for interfacing with PyTorch Dynamo.☆100Updated this week