ksopyla / CudaDotProdLinks
Different implementation of sparse matrix multiplication. All matrices are in CSR format. The code contains different CUDA kernels for multiply sparse matrix vs dense vector and sparse matrix vs another sparse matrix. It contains several cuda kernel for sparse matrix dense vector product and sparse matrix sparse matrix product.
☆16Updated 14 years ago
Alternatives and similar repositories for CudaDotProd
Users that are interested in CudaDotProd are comparing it to the libraries listed below
Sorting:
- a heterogeneous multiGPU level-3 BLAS library☆45Updated 5 years ago
- CUDA Matrix Factorization Library with Stochastic Gradient Descent (SGD)☆71Updated 7 years ago
- Efficient LDA solution on GPUs.☆24Updated 6 years ago
- CUDA Sparse-Matrix Vector Multiplication, using Sliced Coordinate format☆22Updated 7 years ago
- ☆91Updated 8 years ago
- Full-speed Array of Structures access☆171Updated 2 years ago
- sparse matrix pre-processing library☆82Updated last year
- GPU implementation of classical molecular dynamics proxy application.☆31Updated 8 years ago
- Artifact of paper "Exploiting Recent SIMD Architectural Advances for Irregular Applications"☆11Updated 9 years ago
- Kernel Fusion and Runtime Compilation Based on NNVM☆70Updated 8 years ago
- GPU-specialized parameter server for GPU machine learning.☆101Updated 7 years ago
- Optimized half precision gemm assembly kernels (deprecated due to ROCm)☆47Updated 8 years ago
- Multi-GPU Computing Benchmark Suite (CUDA)☆42Updated 8 years ago
- Library for fast image convolution in neural networks on Intel Architecture☆29Updated 8 years ago
- High optimized fft library based on CUDA(the same fast as cufft and faster some times)☆18Updated 8 years ago
- CSR-based SpMV on Heterogeneous Processors (Intel Broadwell, AMD Kaveri and nVidia Tegra K1)☆27Updated 10 years ago
- GPU/CPU (CUDA) Implementation of "Recurrent Memory Array Structures", Simple RNN, LSTM, Array LSTM..☆25Updated 5 years ago
- This repository contains the cuStinger data structure used for dynamic graph representation.☆19Updated 6 years ago
- High Efficiency Convolution Kernel for Maxwell GPU Architecture☆134Updated 8 years ago
- A simple memory manager for CUDA designed to help Deep Learning frameworks manage memory☆297Updated 6 years ago
- Medusa: Building GPU-based Parallel Sparse Graph Applications with Sequential C/C++ Code☆62Updated 4 years ago
- The SparseX sparse kernel optimization library☆39Updated 6 years ago
- Sparse matrix computation library for GPU☆56Updated 4 years ago
- Asynchronous Stochastic Gradient Descent with Delay Compensation☆21Updated 8 years ago
- RDMA Optimization on MXNet☆14Updated 7 years ago
- Machine Learning Toolkit for Extreme Scale (MaTEx)☆110Updated 6 years ago
- kmeans☆54Updated 9 years ago
- Generating Families of Practical Fast Matrix Multiplication Algorithms☆12Updated 8 years ago
- TTC: A high-performance Compiler for Tensor Transpositions☆20Updated 7 years ago
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆106Updated 7 years ago