ksopyla / CudaDotProd
Different implementation of sparse matrix multiplication. All matrices are in CSR format. The code contains different CUDA kernels for multiply sparse matrix vs dense vector and sparse matrix vs another sparse matrix. It contains several cuda kernel for sparse matrix dense vector product and sparse matrix sparse matrix product.
☆16Updated 13 years ago
Related projects: ⓘ
- CUDA Matrix Factorization Library with Stochastic Gradient Descent (SGD)☆71Updated 6 years ago
- a heterogeneous multiGPU level-3 BLAS library☆45Updated 4 years ago
- CUDA Sparse-Matrix Vector Multiplication, using Sliced Coordinate format☆20Updated 6 years ago
- sparse matrix pre-processing library☆81Updated 4 months ago
- ☆88Updated 7 years ago
- Efficient LDA solution on GPUs.☆24Updated 6 years ago
- Kernel Fusion and Runtime Compilation Based on NNVM☆69Updated 7 years ago
- Optimized half precision gemm assembly kernels (deprecated due to ROCm)☆47Updated 7 years ago
- Sparse matrix computation library for GPU☆54Updated 4 years ago
- Full-speed Array of Structures access☆155Updated last year
- The Surprisingly ParalleL spArse Tensor Toolkit.☆68Updated 2 years ago
- This repository contains the cuStinger data structure used for dynamic graph representation.☆18Updated 5 years ago
- Library for fast image convolution in neural networks on Intel Architecture☆29Updated 7 years ago
- CSR-based SpMV on Heterogeneous Processors (Intel Broadwell, AMD Kaveri and nVidia Tegra K1)☆25Updated 9 years ago
- TTC: A high-performance Compiler for Tensor Transpositions☆20Updated 6 years ago
- CSR5-based SpMV on CPUs, GPUs and Xeon Phi☆93Updated 3 months ago
- kmeans☆53Updated 8 years ago
- Medusa: Building GPU-based Parallel Sparse Graph Applications with Sequential C/C++ Code☆61Updated 3 years ago
- ☆30Updated 7 years ago
- High optimized fft library based on CUDA(the same fast as cufft and faster some times)☆18Updated 7 years ago
- Machine Learning Toolkit for Extreme Scale (MaTEx)☆111Updated 6 years ago
- ☆62Updated this week
- High-Performance Streaming Graph Analytics on GPUs☆31Updated 5 years ago
- A CUDA implementation of the PageRank Pipeline Benchmark☆32Updated 7 years ago
- Fork of magma to include more BLAS☆28Updated 7 years ago
- CuSha is a CUDA-based vertex-centric graph processing framework that uses G-Shards and CW representations.☆52Updated 8 years ago
- A portable high-level API with CUDA or OpenCL back-end☆53Updated 6 years ago
- LSH-GPU ANN package☆91Updated 5 years ago
- image to column☆31Updated 10 years ago