HabanaAI / Habana_Custom_Kernel
Provides the examples to write and build Habana custom kernels using the HabanaTools
☆21Updated 3 weeks ago
Alternatives and similar repositories for Habana_Custom_Kernel:
Users that are interested in Habana_Custom_Kernel are comparing it to the libraries listed below
- Implementation of TSM2L and TSM2R -- High-Performance Tall-and-Skinny Matrix-Matrix Multiplication Algorithms for CUDA☆32Updated 4 years ago
- ☆96Updated last year
- ☆31Updated 2 years ago
- ☆104Updated last month
- A CUTLASS implementation using SYCL☆20Updated this week
- ☆60Updated 4 months ago
- ☆51Updated 5 years ago
- ☆79Updated 2 years ago
- ☆38Updated 5 years ago
- Github mirror of trition-lang/triton repo.☆26Updated last week
- An extension library of WMMA API (Tensor Core API)☆96Updated 9 months ago
- ☆70Updated 4 months ago
- Some source code about matrix multiplication implementation on CUDA☆34Updated 6 years ago
- Automatic Mapping Generation, Verification, and Exploration for ISA-based Spatial Accelerators☆108Updated 2 years ago
- Magicube is a high-performance library for quantized sparse matrix operations (SpMM and SDDMM) of deep learning on Tensor Cores.☆87Updated 2 years ago
- Artifacts of EVT ASPLOS'24☆24Updated last year
- PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections☆121Updated 2 years ago
- Benchmark code for the "Online normalizer calculation for softmax" paper☆91Updated 6 years ago
- Dissecting NVIDIA GPU Architecture☆92Updated 2 years ago
- A home for the final text of all TVM RFCs.☆102Updated 7 months ago
- Fast GPU based tensor core reductions☆13Updated 2 years ago
- A simple tool to profile performance of multiple combinations of GEMM of cuBLAS☆25Updated 4 years ago
- An extention of TVMScript to write simple and high performance GPU kernels with tensorcore.☆50Updated 9 months ago
- A Winograd Minimal Filter Implementation in CUDA☆24Updated 3 years ago
- ☆44Updated 4 years ago
- ☆22Updated 2 years ago
- PyTorch emulation library for Microscaling (MX)-compatible data formats☆221Updated 3 weeks ago
- ☆11Updated last month
- System for automated integration of deep learning backends.☆48Updated 2 years ago
- ☆202Updated 9 months ago