flame / how-to-optimize-gemmLinks

☆1,902

Alternatives and similar repositories for how-to-optimize-gemm

Users that are interested in how-to-optimize-gemm are comparing it to the libraries listed below

Sorting:

flame / blislab
BLISlab: A Sandbox for Optimizing GEMM
☆531Updated 4 years ago
tpoisonooo / how-to-optimize-gemm
row-major matmul optimization
☆649Updated last year
pigirons / cpufp
A CPU tool for benchmarking the peak of floating points
☆557Updated 3 weeks ago
microsoft / nnfusion
A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.
☆992Updated 10 months ago
Liu-xiandong / How_to_optimize_in_GPU
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several…
☆1,102Updated 2 years ago
NVIDIA-developer-blog / code-samples
Source code examples from the Parallel Forall Blog
☆1,300Updated last year
OpenPPL / ppl.nn
A primitive library for neural network
☆1,346Updated 8 months ago
BBuf / tvm_mlir_learn
compiler learning resources collect.
☆2,468Updated 4 months ago
google / gemmlowp
Low-precision matrix multiplication
☆1,812Updated last year
d2l-ai / d2l-tvm
Dive into Deep Learning Compiler
☆646Updated 3 years ago
deeperlearning / professional-cuda-c-programming
☆450Updated 10 years ago
alibaba / BladeDISC
BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.
☆886Updated 7 months ago
jiazhihao / TASO
The Tensor Algebra SuperOptimizer for Deep Learning
☆726Updated 2 years ago
BBuf / how-to-optim-algorithm-in-cuda
how to optimize some algorithm in cuda.
☆2,361Updated this week
andravin / wincnn
Winograd minimal convolution algorithm generator for convolutional neural networks.
☆619Updated 4 years ago
yzhaiustc / Optimizing-SGEMM-on-NVIDIA-Turing-GPUs
Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.
☆369Updated 7 months ago
buddy-compiler / buddy-mlir
An MLIR-based compiler framework bridges DSLs (domain-specific languages) to DSAs (domain-specific architectures).
☆611Updated this week
libxsmm / libxsmm
Library for specialized dense and sparse matrix operations, and deep learning primitives.
☆888Updated this week
Cjkkkk / CUDA_gemm
A simple high performance CUDA GEMM implementation.
☆392Updated last year
brucefan1983 / CUDA-Programming
Sample codes for my CUDA programming book
☆1,765Updated 5 months ago
Yinghan-Li / YHs_Sample
Yinghan's Code Sample
☆340Updated 3 years ago
NervanaSystems / maxas
Assembler for NVIDIA Maxwell architecture
☆1,017Updated 2 years ago
NVIDIA / cub
[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl
☆1,765Updated last year
onnx / onnx-mlir
Representation and Reference Lowering of ONNX Models in MLIR Compiler Infrastructure
☆887Updated this week
LitLeo / TensorRT_Tutorial
☆1,030Updated last year
pytorch / FBGEMM
FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/
☆1,415Updated this week
cloudcores / CuAssembler
An unofficial cuda assembler, for all generations of SASS, hopefully ：）
☆523Updated 2 years ago
KEKE046 / mlir-tutorial
Hands-On Practical MLIR Tutorial
☆535Updated last year
wangzyon / NVIDIA_SGEMM_PRACTICE
Step-by-step optimization of CUDA SGEMM
☆362Updated 3 years ago
sophgo / tpu-mlir
Machine learning compiler based on MLIR for Sophgo TPU.
☆765Updated last week