ksopyla / CudaDotProdLinks
Different implementation of sparse matrix multiplication. All matrices are in CSR format. The code contains different CUDA kernels for multiply sparse matrix vs dense vector and sparse matrix vs another sparse matrix. It contains several cuda kernel for sparse matrix dense vector product and sparse matrix sparse matrix product.
☆17Updated 15 years ago
Alternatives and similar repositories for CudaDotProd
Users that are interested in CudaDotProd are comparing it to the libraries listed below
Sorting:
- A simple memory manager for CUDA designed to help Deep Learning frameworks manage memory☆299Updated 7 years ago
- Kernel Fusion and Runtime Compilation Based on NNVM☆72Updated 9 years ago
- ☆94Updated 8 years ago
- CUDA Data Parallel Primitives Library☆438Updated 7 years ago
- Full-speed Array of Structures access☆176Updated 2 years ago
- A CUDNN minimal deep learning training code sample using LeNet.☆268Updated 2 years ago
- CSR5-based SpMV on CPUs, GPUs and Xeon Phi☆108Updated last year
- CUSP : A C++ Templated Sparse Matrix Library☆419Updated 4 months ago
- a heterogeneous multiGPU level-3 BLAS library☆46Updated 6 years ago
- CUDA Sparse-Matrix Vector Multiplication, using Sliced Coordinate format☆22Updated 7 years ago
- Open single and half precision gemm implementations☆394Updated 2 years ago
- High Efficiency Convolution Kernel for Maxwell GPU Architecture☆137Updated 8 years ago
- CSR-based SpMV on Heterogeneous Processors (Intel Broadwell, AMD Kaveri and nVidia Tegra K1)☆27Updated 10 years ago
- High optimized fft library based on CUDA(the same fast as cufft and faster some times)☆19Updated 8 years ago
- CLTune: An automatic OpenCL & CUDA kernel tuner☆182Updated 3 years ago
- kmeans☆55Updated 9 years ago
- Flexible GPGPU instrumentation☆89Updated 6 years ago
- Sparse matrix computation library for GPU☆58Updated 5 years ago
- CUDA Matrix Factorization Library with Stochastic Gradient Descent (SGD)☆71Updated 7 years ago
- Third party assembler and GEMM library for NVIDIA Kepler GPU☆84Updated 6 years ago
- Optimized half precision gemm assembly kernels (deprecated due to ROCm)☆47Updated 8 years ago
- CNN accelerated by cuda. Test on mnist and finilly get 99.76%☆186Updated 8 years ago
- This repository contains the cuStinger data structure used for dynamic graph representation.☆20Updated 6 years ago
- Fast integer division with divisor not known at compile time. To be used primarily in CUDA kernels.☆73Updated 10 years ago
- Greentea LibDNN - a universal convolution implementation supporting CUDA and OpenCL☆137Updated 8 years ago
- The SHOC Benchmark Suite☆259Updated 2 months ago
- Code appendix to an OpenCL matrix-multiplication tutorial☆178Updated 8 years ago
- Benchmarking matrix multiplication implementations☆103Updated 9 years ago
- Efficient Top-K implementation on the GPU☆191Updated 6 years ago
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆109Updated 8 years ago