mvandermerwe / BP-GPU-Message-SchedulingLinks
Code for "Message Scheduling for Performant, Many-Core Belief Propagation"
☆12Updated 6 years ago
Alternatives and similar repositories for BP-GPU-Message-Scheduling
Users that are interested in BP-GPU-Message-Scheduling are comparing it to the libraries listed below
Sorting:
- Some CUDA design patterns and a bit of template magic for CUDA☆158Updated 2 years ago
- Fast K-Nearest Neighbor search with GPU☆143Updated 8 years ago
- matrix multiplication in CUDA☆125Updated 2 years ago
- Introduction to CUDA programming☆129Updated 8 years ago
- Efficient CUDA Stream Compaction Library☆35Updated 2 years ago
- CUDA implementation of exclusive prefix sum via Blelloch's algorithm☆29Updated 8 years ago
- CUSP : A C++ Templated Sparse Matrix Library☆420Updated 6 months ago
- Fast k nearest neighbor search using GPU☆546Updated 7 years ago
- ☆43Updated 8 years ago
- ☆22Updated 8 years ago
- This is a PyTorch implementation of the Scalpel. Node pruning for five benchmark networks and SIMD-aware weight pruning for LeNet-300-100…☆41Updated 7 years ago
- Implementation of the maximum network flow problem in CUDA.☆31Updated 5 years ago
- GPU-based large scale Approx. Nearest Neighbor Search, accepted at CVPR 2016☆92Updated 7 years ago
- ☆101Updated 6 years ago
- Efficient graph clustering software for normalized cut and ratio association on undirected graphs. Copyright(c) 2008 Brian Kulis, Yuqiang…☆22Updated 13 years ago
- Repository holding the code base to AC-SpGEMM : "Adaptive Sparse Matrix-Matrix Multiplication on the GPU"☆31Updated 5 years ago
- Sparse-dense matrix-matrix multiplication on GPUs☆14Updated 7 years ago
- CUDA Data Parallel Primitives Library☆438Updated 7 years ago
- CUDA Sparse-Matrix Vector Multiplication, using Sliced Coordinate format☆22Updated 7 years ago
- Programmable Neural Network Compression☆149Updated 3 years ago
- An GPU/CUDA implementation of the Hungarian algorithm☆116Updated 6 years ago
- EGGS, a method to speed up sparse matrix operations when the same sparsity is used for multiple times. This repo contains examples that s…☆26Updated 5 years ago
- CUDA-accelerated minimum spanning tree algorithm -- data parallel Boruvka's algorithm☆21Updated 9 years ago
- Implementation of ConjugateGradients method using C and Nvidia CUDA☆52Updated 3 years ago
- ☆20Updated 7 years ago
- CNNs in Halide☆23Updated 10 years ago
- [ECCV18] Constraint-Aware Deep Neural Network Compression☆12Updated 7 years ago
- This example builds on the parallel-forall repo separate compilation example by adding CMake to it.☆17Updated 8 years ago
- Source code for: Flex-Convolution (Million-Scale Point-Cloud Learning Beyond Grid-Worlds), accepted at ACCV 2018☆120Updated 6 years ago
- Compute the exact Euclidean Distance Transform and Voronoi Diagram for 2D and 3D binary images using the GPU.☆80Updated 5 years ago