uuudown / S-BLAS
This package includes the implementation for Sparse-Matrix-Vector-Multiplication (SpMV) and Sparse-Matrix-Matrix-Multiplication (SpMM) for Single-node Multi-GPU (scale-up) platforms such as NVIDIA DGX-1 and DGX-2.
☆10Updated 4 years ago
Related projects ⓘ
Alternatives and complementary repositories for S-BLAS
- This package includes the implementation for four sparse linear algebra kernels: Sparse-Matrix-Vector-Multiplication (SpMV), Sparse-Trian…☆24Updated 4 years ago
- A Method for efficiently processing SpMV using SIMD and load balancing☆16Updated 2 years ago
- LonestarGPU: Irregular algorithms parallelized for GPUs☆33Updated 5 years ago
- CSR-based SpGEMM on nVidia and AMD GPUs☆45Updated 8 years ago
- development repository for the open earth compiler☆77Updated 3 years ago
- CUDA Flux is a profiler for GPU applications which reports the basic block executions frequencies of compute kernels☆31Updated 3 years ago
- A novel spatial accelerator for horizontal diffusion weather stencil computation, as described in ICS 2023 paper by Singh et al. (https:/…☆19Updated last year
- cuASR: CUDA Algebra for Semirings☆34Updated 2 years ago
- A Benchmark Suite for Heterogeneous System Computation☆52Updated 3 weeks ago
- Evaluating different memory managers for dynamic GPU memory☆24Updated 3 years ago
- ☆37Updated this week
- BLAS implementation for Intel FPGA☆76Updated 4 years ago
- A GPU performance prediction toolkit for CUDA programs☆16Updated 5 years ago
- Data-Centric MLIR dialect☆38Updated last year
- Efficient SpGEMM on GPU using CUDA and CSR☆50Updated last year
- Code base for OOPSLA'24 paper: UniSparse: An Intermediate Language for General Sparse Format Customization☆28Updated last week
- The SparseX sparse kernel optimization library☆39Updated 5 years ago
- ☆47Updated 5 years ago
- Streaming Message Interface: High-Performance Distributed Memory Programming on Reconfigurable Hardware☆16Updated 2 years ago
- SST Macro Element Library☆34Updated last month
- A web interface for the SuiteSparse Matrix Collection, formerly known as the University of Florida Sparse Matrix Collection☆22Updated 3 weeks ago
- A Synchronization-Free Algorithm for Parallel Sparse Triangular Solves (SpTRSV)☆19Updated 4 years ago
- A repository where GPU applications are aggregated using a common build flow that supports multiple CUDA versions.☆45Updated 2 months ago
- A pattern-based algorithmic autotuner for graph processing on GPUs.☆30Updated last year
- CUDAAdvisor: a GPU profiling tool☆48Updated 6 years ago
- Productive and portable performance programming across spatial architectures (FPGAs, etc.) and vector architectures (GPUs, etc.)☆29Updated 6 months ago
- Code for paper "Design Principles for Sparse Matrix Multiplication on the GPU" accepted to Euro-Par 2018☆71Updated 4 years ago
- A dynamic analysis tool to detect floating-point errors in HPC applications.☆33Updated 2 years ago
- HeteroSync is a benchmark suite for performing fine-grained synchronization on tightly coupled GPUs☆27Updated 2 months ago