mvandermerwe / BP-GPU-Message-Scheduling
Code for "Message Scheduling for Performant, Many-Core Belief Propagation"
☆10Updated 5 years ago
Alternatives and similar repositories for BP-GPU-Message-Scheduling:
Users that are interested in BP-GPU-Message-Scheduling are comparing it to the libraries listed below
- Sparse-dense matrix-matrix multiplication on GPUs☆15Updated 6 years ago
- Implementation of the maximum network flow problem in CUDA.☆29Updated 4 years ago
- CNNs in Halide☆23Updated 9 years ago
- Some CUDA design patterns and a bit of template magic for CUDA☆148Updated last year
- Repository holding the code base to AC-SpGEMM : "Adaptive Sparse Matrix-Matrix Multiplication on the GPU"☆28Updated 4 years ago
- CUDA templates for tile-sparse matrix multiplication based on CUTLASS.☆49Updated 6 years ago
- ☆92Updated 7 years ago
- EGGS, a method to speed up sparse matrix operations when the same sparsity is used for multiple times. This repo contains examples that s…☆25Updated 4 years ago
- CUDA-accelerated minimum spanning tree algorithm -- data parallel Boruvka's algorithm☆18Updated 8 years ago
- ☆42Updated 6 years ago
- Fast integer division with divisor not known at compile time. To be used primarily in CUDA kernels.☆71Updated 9 years ago
- CUDA Sparse-Matrix Vector Multiplication, using Sliced Coordinate format☆20Updated 6 years ago
- An GPU/CUDA implementation of the Hungarian algorithm☆109Updated 5 years ago
- This is a tuned sparse matrix dense vector multiplication(SpMV) library☆21Updated 8 years ago
- Escoin: Efficient Sparse Convolutional Neural Network Inference on GPUs☆15Updated 5 years ago
- ☆21Updated 7 years ago
- Efficient CUDA Stream Compaction Library☆33Updated last year
- Optimized half precision gemm assembly kernels (deprecated due to ROCm)☆47Updated 7 years ago
- implementation of winograd minimal convolution algorithm on Intel Architecture☆39Updated 7 years ago
- Efficient Top-K implementation on the GPU☆150Updated 5 years ago
- Sparse matrix-matrix multiplication on CPU+GPU systems.☆13Updated 10 years ago
- A Unified, Systematic Framework of Structured Weight Pruning for DNNs☆22Updated 6 years ago
- Efficient SpGEMM on GPU using CUDA and CSR☆50Updated last year
- Code for paper "Design Principles for Sparse Matrix Multiplication on the GPU" accepted to Euro-Par 2018☆72Updated 4 years ago
- ☆37Updated 3 years ago
- An implementation of parallel exclusive scan in CUDA☆60Updated 6 years ago
- This is a cross-platform, CUDA-based C++ library for general-purpose, unconstrained nonlinear optimization on the GPU. It implements the …☆134Updated 4 years ago
- This example builds on the parallel-forall repo separate compilation example by adding CMake to it.☆17Updated 7 years ago
- THIS REPOSITORY HAS MOVED TO github.com/nvidia/cub, WHICH IS AUTOMATICALLY MIRRORED HERE.☆83Updated 11 months ago
- Codebase associated with the PyTorch compiler tutorial☆44Updated 5 years ago