shixun404 / Fault-Tolerant-SGEMM-on-NVIDIA-GPUsLinks

Anatomy of High-Performance GEMM with Online Fault Tolerance on GPUs

☆12

Alternatives and similar repositories for Fault-Tolerant-SGEMM-on-NVIDIA-GPUs

Users that are interested in Fault-Tolerant-SGEMM-on-NVIDIA-GPUs are comparing it to the libraries listed below

Sorting:

temporal-hpc / reduction-tensor-cores
Fast GPU based tensor core reductions
☆13Updated 2 years ago
ROCm / rocSHMEM
rocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.
☆127Updated this week
SpRegTiling / sparse-register-tiling
☆10Updated last year
merthidayetoglu / HiCCL
A hierarchical collective communications library with portable optimizations
☆36Updated 11 months ago
Jokeren / GPA
GPU Performance Advisor
☆65Updated 3 years ago
astra-sim / tacos
TACOS: [T]opology-[A]ware [Co]llective Algorithm [S]ynthesizer for Distributed Machine Learning
☆28Updated 5 months ago
NMSU-PEARL / PPT-GPU
Performance Prediction Toolkit for GPUs
☆39Updated 3 years ago
sunlex0717 / DissectingTensorCores
☆109Updated last year
AlphaSparse / Library
A sparse BLAS lib supporting multiple backends
☆48Updated last week
olcf / NVIDIA-tensor-core-examples
☆20Updated 6 years ago
hpdps-group / COCCL
COCCL: Compression and precision co-aware collective communication library
☆27Updated 8 months ago
cyanguwa / nersc-roofline
☆48Updated 5 years ago
wudu98 / autoGEMM
☆14Updated 11 months ago
spcl / atlahs
ATLAHS: An Application-centric Network Simulator Toolchain for AI, HPC, and Distributed Storage
☆51Updated last week
apuaaChen / vectorSparse
☆32Updated 3 years ago
intel / sycl-tla
SYCL* Templates for Linear Algebra (SYCL*TLA) - SYCL based CUTLASS implementation for Intel GPUs
☆51Updated this week
hibagus / CUDA_Bench
CUDA GPU Benchmark
☆35Updated 9 months ago
microsoft / ConvStencil
☆33Updated last year
ParCIS / Magicube
Magicube is a high-performance library for quantized sparse matrix operations (SpMM and SDDMM) of deep learning on Tensor Cores.
☆89Updated 2 years ago
c3sr / tcu_scope
☆50Updated 6 years ago
uuudown / Tartan
Tartan: Evaluating Modern GPU Interconnect via a Multi-GPU Benchmark Suite
☆66Updated 7 years ago
north-numerical-computing / tensor-cores-numerical-behavior
Test suite for probing the numerical behavior of NVIDIA tensor cores
☆41Updated last year
apuaaChen / EVT_AE
Artifacts of EVT ASPLOS'24
☆28Updated last year
sjfeng1999 / gpu-arch-microbenchmark
Dissecting NVIDIA GPU Architecture
☆110Updated 3 years ago
HAWAIILAB / cuda-flux
CUDA Flux is a profiler for GPU applications which reports the basic block executions frequencies of compute kernels
☆32Updated 4 years ago
SuperScientificSoftwareLaboratory / DASP
Source code of the SC '23 paper: "DASP: Specific Dense Matrix Multiply-Accumulate Units Accelerated General Sparse Matrix-Vector Multipli…
☆27Updated last year
owensgroup / merge-spmm
Code for paper "Design Principles for Sparse Matrix Multiplication on the GPU" accepted to Euro-Par 2018
☆73Updated 5 years ago
eth-cscs / Tiled-MM
Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.
☆32Updated 7 months ago
SuperScientificSoftwareLaboratory / TileSpGEMM
Source code of the PPoPP '22 paper: "TileSpGEMM: A Tiled Algorithm for Parallel Sparse General Matrix-Matrix Multiplication on GPUs" by Y…
☆42Updated last year
gunrock / loops
🎃 GPU load-balancing library for regular and irregular computations.
☆62Updated 2 months ago