ksopyla / CudaDotProd
Different implementation of sparse matrix multiplication. All matrices are in CSR format. The code contains different CUDA kernels for multiply sparse matrix vs dense vector and sparse matrix vs another sparse matrix. It contains several cuda kernel for sparse matrix dense vector product and sparse matrix sparse matrix product.
☆16Updated 14 years ago
Alternatives and similar repositories for CudaDotProd:
Users that are interested in CudaDotProd are comparing it to the libraries listed below
- CUDA Matrix Factorization Library with Stochastic Gradient Descent (SGD)☆71Updated 7 years ago
- sparse matrix pre-processing library☆82Updated 10 months ago
- The Surprisingly ParalleL spArse Tensor Toolkit.☆70Updated 3 years ago
- Kernel Fusion and Runtime Compilation Based on NNVM☆70Updated 8 years ago
- ☆91Updated 8 years ago
- a heterogeneous multiGPU level-3 BLAS library☆45Updated 5 years ago
- CUDA Sparse-Matrix Vector Multiplication, using Sliced Coordinate format☆21Updated 6 years ago
- kmeans☆54Updated 8 years ago
- Efficient LDA solution on GPUs.☆24Updated 6 years ago
- Generating Families of Practical Fast Matrix Multiplication Algorithms☆12Updated 7 years ago
- This repository contains the cuStinger data structure used for dynamic graph representation.☆19Updated 6 years ago
- Medusa: Building GPU-based Parallel Sparse Graph Applications with Sequential C/C++ Code☆61Updated 4 years ago
- Sparse matrix computation library for GPU☆54Updated 4 years ago
- Training deep neural networks with low precision multiplications☆63Updated 9 years ago
- Optimized half precision gemm assembly kernels (deprecated due to ROCm)☆47Updated 7 years ago
- A CUDA implementation of the PageRank Pipeline Benchmark☆32Updated 8 years ago
- CuSha is a CUDA-based vertex-centric graph processing framework that uses G-Shards and CW representations.☆52Updated 9 years ago
- A high performance implementation of kmeans algorithm with cuda☆18Updated 10 years ago
- image to column☆30Updated 10 years ago
- Simple MXNet sequence-to-sequence model (neural machine translation)☆24Updated 7 years ago
- Benchmarking matrix multiplication implementations☆98Updated 8 years ago
- CSR-based SpMV on Heterogeneous Processors (Intel Broadwell, AMD Kaveri and nVidia Tegra K1)☆27Updated 9 years ago
- TTC: A high-performance Compiler for Tensor Transpositions☆20Updated 7 years ago
- Full-speed Array of Structures access☆167Updated last year
- A fast deep neural network library (CPU) for speech recognition☆84Updated 6 years ago
- High Efficiency Convolution Kernel for Maxwell GPU Architecture☆134Updated 7 years ago
- GPU implementation of classical molecular dynamics proxy application.☆31Updated 8 years ago
- High-Performance Tensor Transpose library☆190Updated last year
- High-Performance Streaming Graph Analytics on GPUs☆31Updated 6 years ago
- A simple memory manager for CUDA designed to help Deep Learning frameworks manage memory☆297Updated 6 years ago