ksopyla / CudaDotProd
Different implementation of sparse matrix multiplication. All matrices are in CSR format. The code contains different CUDA kernels for multiply sparse matrix vs dense vector and sparse matrix vs another sparse matrix. It contains several cuda kernel for sparse matrix dense vector product and sparse matrix sparse matrix product.
☆16Updated 14 years ago
Alternatives and similar repositories for CudaDotProd:
Users that are interested in CudaDotProd are comparing it to the libraries listed below
- CUDA Matrix Factorization Library with Stochastic Gradient Descent (SGD)☆71Updated 7 years ago
- a heterogeneous multiGPU level-3 BLAS library☆45Updated 5 years ago
- A simple memory manager for CUDA designed to help Deep Learning frameworks manage memory☆297Updated 6 years ago
- Kernel Fusion and Runtime Compilation Based on NNVM☆70Updated 8 years ago
- sparse matrix pre-processing library☆81Updated 11 months ago
- Full-speed Array of Structures access☆169Updated last year
- CUDA Sparse-Matrix Vector Multiplication, using Sliced Coordinate format☆21Updated 6 years ago
- Efficient LDA solution on GPUs.☆24Updated 6 years ago
- Sparse matrix computation library for GPU☆56Updated 4 years ago
- Medusa: Building GPU-based Parallel Sparse Graph Applications with Sequential C/C++ Code☆61Updated 4 years ago
- CNN accelerated by cuda. Test on mnist and finilly get 99.76%☆187Updated 7 years ago
- ☆91Updated 8 years ago
- High Efficiency Convolution Kernel for Maxwell GPU Architecture☆134Updated 7 years ago
- High optimized fft library based on CUDA(the same fast as cufft and faster some times)☆18Updated 7 years ago
- GPU/CPU (CUDA) Implementation of "Recurrent Memory Array Structures", Simple RNN, LSTM, Array LSTM..☆25Updated 5 years ago
- GPU-based large scale Approx. Nearest Neighbor Search, accepted at CVPR 2016☆92Updated 6 years ago
- GPU-specialized parameter server for GPU machine learning.☆101Updated 7 years ago
- This repository contains the cuStinger data structure used for dynamic graph representation.☆19Updated 6 years ago
- Greentea LibDNN - a universal convolution implementation supporting CUDA and OpenCL☆135Updated 8 years ago
- image to column☆30Updated 10 years ago
- Benchmarks for CNTK and other toolkits.☆44Updated 9 years ago
- Efficient Top-K implementation on the GPU☆175Updated 6 years ago
- kmeans☆54Updated 8 years ago
- Open single and half precision gemm implementations☆380Updated 2 years ago
- Generating Families of Practical Fast Matrix Multiplication Algorithms☆12Updated 7 years ago
- A CUDNN minimal deep learning training code sample using LeNet.☆264Updated last year
- Matrix Shadow:Lightweight CPU/GPU Matrix and Tensor Template Library in C++/CUDA for (Deep) Machine Learning☆33Updated 8 years ago
- Fork of magma to include more BLAS☆28Updated 8 years ago
- CSR-based SpMV on Heterogeneous Processors (Intel Broadwell, AMD Kaveri and nVidia Tegra K1)☆27Updated 9 years ago
- CUSP : A C++ Templated Sparse Matrix Library☆411Updated 5 months ago