uuudown / S-BLAS

This package includes the implementation for Sparse-Matrix-Vector-Multiplication (SpMV) and Sparse-Matrix-Matrix-Multiplication (SpMM) for Single-node Multi-GPU (scale-up) platforms such as NVIDIA DGX-1 and DGX-2.

☆10

Related projects ⓘ

Alternatives and complementary repositories for S-BLAS

pnnl / s-blas
This package includes the implementation for four sparse linear algebra kernels: Sparse-Matrix-Vector-Multiplication (SpMV), Sparse-Trian…
☆24Updated 4 years ago
nulidangxueshen / ALBUS
A Method for efficiently processing SpMV using SIMD and load balancing
☆16Updated 2 years ago
IntelligentSoftwareSystems / GaloisGPU
LonestarGPU: Irregular algorithms parallelized for GPUs
☆33Updated 5 years ago
weifengliu-ssslab / Benchmark_SpGEMM_using_CSR
CSR-based SpGEMM on nVidia and AMD GPUs
☆45Updated 8 years ago
spcl / open-earth-compiler
development repository for the open earth compiler
☆77Updated 3 years ago
UniHD-CEG / cuda-flux
CUDA Flux is a profiler for GPU applications which reports the basic block executions frequencies of compute kernels
☆31Updated 3 years ago
CMU-SAFARI / SPARTA
A novel spatial accelerator for horizontal diffusion weather stencil computation, as described in ICS 2023 paper by Singh et al. (https:/…
☆19Updated last year
hpcgarage / cuASR
cuASR: CUDA Algebra for Semirings
☆34Updated 2 years ago
NUCAR-DEV / Hetero-Mark
A Benchmark Suite for Heterogeneous System Computation
☆52Updated 3 weeks ago
GPUPeople / GPUMemManSurvey
Evaluating different memory managers for dynamic GPU memory
☆24Updated 3 years ago
pnnl / COMET
☆37Updated this week
spcl / FBLAS
BLAS implementation for Intel FPGA
☆76Updated 4 years ago
ekondis / gpuroofperf-toolkit
A GPU performance prediction toolkit for CUDA programs
☆16Updated 5 years ago
spcl / mlir-dace
Data-Centric MLIR dialect
☆38Updated last year
GPUPeople / spECK
Efficient SpGEMM on GPU using CUDA and CSR
☆50Updated last year
cornell-zhang / UniSparse
Code base for OOPSLA'24 paper: UniSparse: An Intermediate Language for General Sparse Format Customization
☆28Updated last week
cslab-ntua / sparsex
The SparseX sparse kernel optimization library
☆39Updated 5 years ago
NVlabs / ptxmemorymodel
☆47Updated 5 years ago
spcl / SMI
Streaming Message Interface: High-Performance Distributed Memory Programming on Reconfigurable Hardware
☆16Updated 2 years ago
sstsimulator / sst-macro
SST Macro Element Library
☆34Updated last month
ScottKolo / suitesparse-matrix-collection-website
A web interface for the SuiteSparse Matrix Collection, formerly known as the University of Florida Sparse Matrix Collection
☆22Updated 3 weeks ago
weifengliu-ssslab / Benchmark_SpTRSV_using_CSC
A Synchronization-Free Algorithm for Parallel Sparse Triangular Solves (SpTRSV)
☆19Updated 4 years ago
accel-sim / gpu-app-collection
A repository where GPU applications are aggregated using a common build flow that supports multiple CUDA versions.
☆45Updated 2 months ago
PAA-NCIC / GSWITCH
A pattern-based algorithmic autotuner for graph processing on GPUs.
☆30Updated last year
sderek / CUDAAdvisor
CUDAAdvisor: a GPU profiling tool
☆48Updated 6 years ago
IntelLabs / t2sp
Productive and portable performance programming across spatial architectures (FPGAs, etc.) and vector architectures (GPUs, etc.)
☆29Updated 6 months ago
owensgroup / merge-spmm
Code for paper "Design Principles for Sparse Matrix Multiplication on the GPU" accepted to Euro-Par 2018
☆71Updated 4 years ago
LLNL / FPChecker
A dynamic analysis tool to detect floating-point errors in HPC applications.
☆33Updated 2 years ago
mattsinc / heterosync
HeteroSync is a benchmark suite for performing fine-grained synchronization on tightly coupled GPUs
☆27Updated 2 months ago