google-research / sputnikLinks

A library of GPU kernels for sparse matrix operations.

☆275

Alternatives and similar repositories for sputnik

Users that are interested in sputnik are comparing it to the libraries listed below

Sorting:

parasj / checkmate
Training neural networks in TensorFlow 2.0 with 5x less memory
☆136Updated 3 years ago
daadaada / turingas
Assembler for NVIDIA Volta and Turing GPUs
☆230Updated 3 years ago
YulhwaKim / cutlass_tilesparse
CUDA templates for tile-sparse matrix multiplication based on CUTLASS.
☆50Updated 7 years ago
uwsampl / SparseTIR
SparseTIR: Sparse Tensor Compiler for Deep Learning
☆138Updated 2 years ago
ColfaxResearch / cutlass-kernels
☆240Updated last year
owensgroup / merge-spmm
Code for paper "Design Principles for Sparse Matrix Multiplication on the GPU" accepted to Euro-Par 2018
☆73Updated 5 years ago
NVIDIA / Fuser
A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")
☆357Updated last week
awslabs / raf
☆145Updated 8 months ago
spcl / substation
Research and development for optimizing transformers
☆131Updated 4 years ago
sunlex0717 / DissectingTensorCores
☆109Updated last year
wzsh / wmma_tensorcore_sample
Matrix Multiply-Accumulate with CUDA and WMMA( Tensor Core)
☆143Updated 5 years ago
leimao / CUDA-GEMM-Optimization
CUDA Matrix Multiplication Optimization
☆230Updated last year
wmmae / wmma_extension
An extension library of WMMA API (Tensor Core API)
☆106Updated last year
albanD / subclass_zoo
☆178Updated last year
iree-org / iree-nvgpu
☆50Updated last year
mit-han-lab / inter-operator-scheduler
[MLSys 2021] IOS: Inter-Operator Scheduler for CNN Acceleration
☆199Updated 3 years ago
ColfaxResearch / cfx-article-src
☆150Updated 5 months ago
facebookexperimental / triton
Github mirror of trition-lang/triton repo.
☆86Updated this week
ParCIS / Magicube
Magicube is a high-performance library for quantized sparse matrix operations (SpMM and SDDMM) of deep learning on Tensor Cores.
☆89Updated 2 years ago
thu-pacman / PET
PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections
☆122Updated 3 years ago
wangsiping97 / FastGEMV
High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.
☆116Updated last year
pytorch / helion
A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.
☆389Updated last week
yifuwang / symm-mem-recipes
☆141Updated 9 months ago
ROCm / aotriton
Ahead of Time (AOT) Triton Math Library
☆79Updated this week
microsoft / SparTA
☆153Updated last year
triton-lang / kernels
☆92Updated 11 months ago
oresths / tSparse
A GPU algorithm for sparse matrix-matrix multiplication
☆72Updated 5 years ago
microsoft / triton-shared
Shared Middle-Layer for Triton Compilation
☆292Updated 2 weeks ago
mmperf / mmperf
MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.
☆134Updated 2 years ago
tlc-pack / relax
☆193Updated 2 years ago