Bruce-Lee-LY / cuda_back2back_hgemm
Use tensor core to calculate back-to-back HGEMM (half-precision general matrix multiplication) with MMA PTX instruction.
☆11Updated last year
Alternatives and similar repositories for cuda_back2back_hgemm:
Users that are interested in cuda_back2back_hgemm are comparing it to the libraries listed below
- A memory profiler for NVIDIA GPUs to explore memory inefficiencies in GPU-accelerated applications.☆22Updated 3 months ago
- An extension library of WMMA API (Tensor Core API)☆87Updated 6 months ago
- CUDA Templates for Linear Algebra Subroutines☆11Updated this week
- ☆38Updated 4 years ago
- Experiments and prototypes associated with IREE or MLIR☆51Updated 5 months ago
- GPU Performance Advisor☆63Updated 2 years ago
- A Winograd Minimal Filter Implementation in CUDA☆23Updated 3 years ago
- ☆66Updated 3 weeks ago
- Optimize GEMM with tensorcore step by step☆19Updated last year
- MLIRX is now defunct. Please see PolyBlocks - https://docs.polymagelabs.com☆38Updated last year
- Data-Centric MLIR dialect☆40Updated last year
- CUTLASS and CuTe Examples☆35Updated 2 weeks ago