iVishalr / GEMMLinks

Fast Matrix Multiplication Implementation in C programming language. This matrix multiplication algorithm is similar to what Numpy uses to compute dot products.

☆36

Alternatives and similar repositories for GEMM

Users that are interested in GEMM are comparing it to the libraries listed below

Sorting:

leimao / CUDA-GEMM-Optimization
CUDA Matrix Multiplication Optimization
☆218Updated last year
wzsh / wmma_tensorcore_sample
Matrix Multiply-Accumulate with CUDA and WMMA( Tensor Core)
☆139Updated 5 years ago
nod-ai / techtalks
☆15Updated last year
openxla / shardy
MLIR-based partitioning system
☆125Updated this week
sunlex0717 / DissectingTensorCores
☆106Updated last year
leimao / CUTLASS-Examples
CUTLASS and CuTe Examples
☆72Updated last month
bertmaher / simplegemm
☆115Updated 5 months ago
sjfeng1999 / gpu-arch-microbenchmark
Dissecting NVIDIA GPU Architecture
☆104Updated 3 years ago
nvixnu / pmpp__programming_massively_parallel_processors
Examples and exercises from the book Programming Massively Parallel Processors - A Hands-on Approach. David B. Kirk and Wen-mei W. Hwu (T…
☆72Updated 4 years ago
microsoft / TileFusion
TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.
☆95Updated 2 months ago
libxsmm / tpp-mlir
TPP experimentation on MLIR for linear algebra
☆136Updated 3 weeks ago
CRobeck / instrument-amdgpu-kernels
LLVM/MLIR based compiler instrumentation of AMD GPU kernels
☆18Updated last month
0xD0GF00D / DocumentSASS
Unofficial description of the CUDA assembly (SASS) instruction sets.
☆138Updated last month
meta-pytorch / triton-cpu
An experimental CPU backend for Triton (https//github.com/openai/triton)
☆45Updated 2 weeks ago
makslevental / nelli
A lightweight, Pythonic, frontend for MLIR
☆80Updated last year
wmmae / wmma_extension
An extension library of WMMA API (Tensor Core API)
☆103Updated last year
bondhugula / llvm-project
The LLVM Project is a collection of modular and reusable compiler and toolchain technologies. Note: the repository does not accept github…
☆33Updated last month
ColfaxResearch / cfx-article-src
☆136Updated 3 months ago
NVIDIA / Fuser
A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")
☆351Updated this week
TiledTensor / TiledCUDA
We invite you to visit and follow our new repository at https://github.com/microsoft/TileFusion. TiledCUDA is a highly efficient kernel …
☆183Updated 7 months ago
triton-lang / triton-cpu
An experimental CPU backend for Triton
☆146Updated 3 months ago
roastduck / FreeTensor
A language and compiler for irregular tensor programs.
☆149Updated 9 months ago
iree-org / iree-turbine
IREE's PyTorch Frontend, based on Torch Dynamo.
☆95Updated this week
Lewuathe / mlir-hello
MLIR Sample dialect
☆126Updated 6 months ago
tenstorrent / tt-mlir
Tenstorrent MLIR compiler
☆174Updated this week
intel / graph-compiler
MLIR-based toolkit targeting intel heterogeneous hardware
☆47Updated 6 months ago
mmperf / mmperf
MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.
☆134Updated last year
makslevental / mlir-python-extras
The missing pieces (as far as boilerplate reduction goes) of the upstream MLIR python bindings.
☆105Updated this week
microsoft / triton-shared
Shared Middle-Layer for Triton Compilation
☆275Updated this week
intel / mlir-extensions
Intel® Extension for MLIR. A staging ground for MLIR dialects and tools for Intel devices using the MLIR toolchain.
☆142Updated this week