lzhengchun / matrix-cuda
matrix multiplication in CUDA
☆123Updated last year
Alternatives and similar repositories for matrix-cuda:
Users that are interested in matrix-cuda are comparing it to the libraries listed below
- Instructions, Docker images, and examples for Nsight Compute and Nsight Systems☆130Updated 4 years ago
- ☆436Updated 9 years ago
- Code for paper "Design Principles for Sparse Matrix Multiplication on the GPU" accepted to Euro-Par 2018☆71Updated 4 years ago
- CUDA by practice☆125Updated 5 years ago
- Matrix Multiply-Accumulate with CUDA and WMMA( Tensor Core)☆130Updated 4 years ago
- ☆91Updated 8 years ago
- Implementation of TSM2L and TSM2R -- High-Performance Tall-and-Skinny Matrix-Matrix Multiplication Algorithms for CUDA☆32Updated 4 years ago
- CSR-based SpGEMM on nVidia and AMD GPUs☆45Updated 9 years ago
- CUDA Matrix Multiplication Optimization☆181Updated 9 months ago
- Fast CUDA Kernels for ResNet Inference.☆173Updated 5 years ago
- CUDA official sample codes☆366Updated 9 years ago
- cuDNN sample codes provided by Nvidia☆45Updated 6 years ago
- CUDA templates for tile-sparse matrix multiplication based on CUTLASS.☆51Updated 7 years ago
- Implementation of breadth first search on GPU with CUDA Driver API.☆49Updated 4 years ago
- Efficient SpGEMM on GPU using CUDA and CSR☆52Updated last year
- CUDA for MNIST training/inference☆40Updated last year
- ☆21Updated 2 years ago
- Simple neural network implementation using CUDA technology. It is an educational implementation.☆96Updated 7 years ago
- Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.☆339Updated 3 months ago
- Samples demonstrating how to use the Compute Sanitizer Tools and Public API☆79Updated last year
- Implementation and analysis of five different GPU based SPMV algorithms in CUDA☆39Updated 6 years ago
- Dissecting NVIDIA GPU Architecture☆92Updated 2 years ago
- A library of GPU kernels for sparse matrix operations.☆262Updated 4 years ago
- Step-by-step optimization of CUDA SGEMM☆310Updated 3 years ago
- Implementation of a simple CNN using CUDA☆68Updated 7 years ago
- Examples demonstrating available options to program multiple GPUs in a single node or a cluster☆689Updated 2 months ago
- Efficient Top-K implementation on the GPU☆176Updated 6 years ago
- Introduction to CUDA programming☆116Updated 7 years ago
- This is a tuned sparse matrix dense vector multiplication(SpMV) library☆21Updated 9 years ago
- Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial☆255Updated last month