HabanaAI / Habana_Custom_KernelLinks

Provides the examples to write and build Habana custom kernels using the HabanaTools

☆22

Alternatives and similar repositories for Habana_Custom_Kernel

Users that are interested in Habana_Custom_Kernel are comparing it to the libraries listed below

Sorting:

c3sr / tcu_scope
☆51Updated 6 years ago
sunlex0717 / DissectingTensorCores
☆106Updated last year
intel / cutlass-sycl
A CUTLASS implementation using SYCL
☆32Updated this week
apuaaChen / vectorSparse
☆32Updated 2 years ago
ParCIS / Magicube
Magicube is a high-performance library for quantized sparse matrix operations (SpMM and SDDMM) of deep learning on Tensor Cores.
☆89Updated 2 years ago
apuaaChen / EVT_AE
Artifacts of EVT ASPLOS'24
☆26Updated last year
temporal-hpc / reduction-tensor-cores
Fast GPU based tensor core reductions
☆13Updated 2 years ago
sjfeng1999 / gpu-arch-microbenchmark
Dissecting NVIDIA GPU Architecture
☆103Updated 3 years ago
lixiuhong / batched_gemm
☆39Updated 5 years ago
wzsh / wmma_tensorcore_sample
Matrix Multiply-Accumulate with CUDA and WMMA( Tensor Core)
☆138Updated 4 years ago
wmmae / wmma_extension
An extension library of WMMA API (Tensor Core API)
☆99Updated last year
daadaada / gas
☆45Updated 4 years ago
parasailteam / coconet
☆80Updated 2 years ago
owensgroup / merge-spmm
Code for paper "Design Principles for Sparse Matrix Multiplication on the GPU" accepted to Euro-Par 2018
☆72Updated 4 years ago
hgyhungry / ShflBW_Sparse_NN
☆16Updated 2 years ago
codyjrivera / tsm2x-imp
Implementation of TSM2L and TSM2R -- High-Performance Tall-and-Skinny Matrix-Matrix Multiplication Algorithms for CUDA
☆35Updated 5 years ago
north-numerical-computing / tensor-cores-numerical-behavior
Test suite for probing the numerical behavior of NVIDIA tensor cores
☆40Updated last year
ColfaxResearch / cfx-article-src
☆127Updated 3 months ago
pku-liang / AMOS
Automatic Mapping Generation, Verification, and Exploration for ISA-based Spatial Accelerators
☆114Updated 2 years ago
hgyhungry / alcop-artifact
☆23Updated 2 years ago
Jokeren / GPA
GPU Performance Advisor
☆65Updated 3 years ago
intel / xetla
☆62Updated 7 months ago
nox-410 / tvm.tl
An extention of TVMScript to write simple and high performance GPU kernels with tensorcore.
☆50Updated last year
Huanghongru / SGEMM-Implementation-and-Optimization
Some source code about matrix multiplication implementation on CUDA
☆34Updated 6 years ago
uwsampl / SparseTIR
SparseTIR: Sparse Tensor Compiler for Deep Learning
☆137Updated 2 years ago
microsoft / ConvStencil
☆31Updated last year
HPMLL / DTC-SpMM_ASPLOS24
☆33Updated last year
facebookexperimental / triton
Github mirror of trition-lang/triton repo.
☆48Updated this week
hgyhungry / ge-spmm
☆109Updated 4 years ago
GVProf / GVProf
GVProf: A Value Profiler for GPU-based Clusters
☆51Updated last year