zpzim / MSplitGEMM

Large matrix multiplication in CUDA

☆15

Alternatives and similar repositories for MSplitGEMM:

Users that are interested in MSplitGEMM are comparing it to the libraries listed below

weifengliu-ssslab / Benchmark_SpGEMM_using_CSR
CSR-based SpGEMM on nVidia and AMD GPUs
☆45Updated 8 years ago
hclhkbu / gcoospdm
Sparse-dense matrix-matrix multiplication on GPUs
☆15Updated 6 years ago
EBD-CREST / nsparse
Sparse matrix computation library for GPU
☆54Updated 4 years ago
dumerrill / merge-spmv
☆93Updated 8 years ago
ap-hynninen / cutt
CUDA Tensor Transpose (cuTT) library
☆51Updated 7 years ago
poojahira / spmv-cuda
Implementation and analysis of five different GPU based SPMV algorithms in CUDA
☆38Updated 6 years ago
codyjrivera / tsm2x-imp
Implementation of TSM2L and TSM2R -- High-Performance Tall-and-Skinny Matrix-Matrix Multiplication Algorithms for CUDA
☆32Updated 4 years ago
GPUPeople / spECK
Efficient SpGEMM on GPU using CUDA and CSR
☆50Updated last year
GPUPeople / ACSpGEMM
Repository holding the code base to AC-SpGEMM : "Adaptive Sparse Matrix-Matrix Multiplication on the GPU"
☆28Updated 4 years ago
weifengliu-ssslab / bhSPARSE
bhSPARSE: A Sparse BLAS Library
☆16Updated 9 years ago
hpcgarage / ParTI
Parallel Tensor Infrastructure (ParTI!)
☆28Updated 4 years ago
CoffeeBeforeArch / nvbit_tools
☆11Updated 4 years ago
ShadenSmith / splatt
The Surprisingly ParalleL spArse Tensor Toolkit.
☆70Updated 2 years ago
eth-cscs / COSMA
Distributed Communication-Optimal Matrix-Matrix Multiplication Algorithm
☆198Updated 2 months ago
weifengliu-ssslab / Benchmark_SpMV_using_CSR
CSR-based SpMV on Heterogeneous Processors (Intel Broadwell, AMD Kaveri and nVidia Tegra K1)
☆26Updated 9 years ago
pnnl / s-blas
This package includes the implementation for four sparse linear algebra kernels: Sparse-Matrix-Vector-Multiplication (SpMV), Sparse-Trian…
☆26Updated 4 years ago
CSshengxy / MEC
ICML2017 MEC: Memory-efficient Convolution for Deep Neural Network C++实现(非官方)
☆17Updated 5 years ago
LLNL / acrotensor
A C++ library for computing large scale tensor contractions.
☆36Updated 6 years ago
weifengliu-ssslab / Benchmark_SpMV_using_CSR5
CSR5-based SpMV on CPUs, GPUs and Xeon Phi
☆102Updated 8 months ago
rox906 / tcFFT
☆37Updated 3 years ago
wzsh / wmma_tensorcore_sample
Matrix Multiply-Accumulate with CUDA and WMMA( Tensor Core)
☆124Updated 4 years ago
CNugteren / CLTune
CLTune: An automatic OpenCL & CUDA kernel tuner
☆173Updated 2 years ago
tbennun / mgbench
Multi-GPU Computing Benchmark Suite (CUDA)
☆42Updated 7 years ago
pigirons / spmv
This is a tuned sparse matrix dense vector multiplication(SpMV) library
☆21Updated 8 years ago
kberkay / Cuda-Matrix-Multiplication
Matrix Multiplication on GPU using Shared Memory considering Coalescing and Bank Conflicts
☆24Updated 2 years ago
lightsighter / Weft
A Sound and Complete Verification Tool for Warp-Specialized GPU Kernels
☆18Updated 9 years ago
patflick / miopen-benchmark
benchmarking miopen
☆17Updated 6 years ago
UoB-HPC / hpc-course-examples
Examples for HPC course
☆39Updated 3 years ago
chenxuhao / caffe-escoin
Escoin: Efficient Sparse Convolutional Neural Network Inference on GPUs
☆15Updated 5 years ago
md2z34 / winograd_gpu
GPU implementation of Winograd convolution
☆10Updated 7 years ago