mvandermerwe / BP-GPU-Message-SchedulingLinks
Code for "Message Scheduling for Performant, Many-Core Belief Propagation"
☆12Updated 6 years ago
Alternatives and similar repositories for BP-GPU-Message-Scheduling
Users that are interested in BP-GPU-Message-Scheduling are comparing it to the libraries listed below
Sorting:
- Fast k nearest neighbor search using GPU☆543Updated 7 years ago
- Fast K-Nearest Neighbor search with GPU☆142Updated 8 years ago
- Some CUDA design patterns and a bit of template magic for CUDA☆156Updated 2 years ago
- CUDA Sparse-Matrix Vector Multiplication, using Sliced Coordinate format☆22Updated 7 years ago
- CUDA-accelerated minimum spanning tree algorithm -- data parallel Boruvka's algorithm☆20Updated 9 years ago
- Introduction to CUDA programming☆129Updated 8 years ago
- This is a cross-platform, CUDA-based C++ library for general-purpose, unconstrained nonlinear optimization on the GPU. It implements the …☆138Updated 5 years ago
- CUSP : A C++ Templated Sparse Matrix Library☆418Updated 3 months ago
- Direct Graphical Models (DGM) C++ library, a cross-platform Conditional Random Fields library, which is optimized for parallel computing …☆188Updated 3 years ago
- matrix multiplication in CUDA☆123Updated 2 years ago
- ☆22Updated 8 years ago
- ☆44Updated 7 years ago
- Implementation of ConjugateGradients method using C and Nvidia CUDA☆51Updated 3 years ago
- Efficient graph clustering software for normalized cut and ratio association on undirected graphs. Copyright(c) 2008 Brian Kulis, Yuqiang…☆22Updated 13 years ago
- CUDA implementation of exclusive prefix sum via Blelloch's algorithm☆29Updated 8 years ago
- EGGS, a method to speed up sparse matrix operations when the same sparsity is used for multiple times. This repo contains examples that s…☆25Updated 5 years ago
- CUDA Data Parallel Primitives Library☆436Updated 7 years ago
- GPU-based large scale Approx. Nearest Neighbor Search, accepted at CVPR 2016☆92Updated 7 years ago
- Source code for: Flex-Convolution (Million-Scale Point-Cloud Learning Beyond Grid-Worlds), accepted at ACCV 2018☆118Updated 6 years ago
- This is a PyTorch implementation of the Scalpel. Node pruning for five benchmark networks and SIMD-aware weight pruning for LeNet-300-100…☆41Updated 7 years ago
- This example builds on the parallel-forall repo separate compilation example by adding CMake to it.☆17Updated 8 years ago
- kmeans clustering with multi-GPU capabilities☆119Updated 2 years ago
- Sparse-dense matrix-matrix multiplication on GPUs☆14Updated 7 years ago
- Repository holding the code base to AC-SpGEMM : "Adaptive Sparse Matrix-Matrix Multiplication on the GPU"☆29Updated 5 years ago
- PLEASE SEE THE OFFICIAL REPOSITORY. THIS IS NOT MAINTAINED ANYMORE.☆93Updated 5 years ago
- Simple example of implementing a new Tensorflow operation and its gradient in C++.☆56Updated 6 years ago
- Example code used in the CVPR 2015 tutorial☆42Updated 10 years ago
- A CUDA implementation of the k-means clustering algorithm☆254Updated 13 years ago
- MWE for using the Eigen library in CUDA kernels☆120Updated 3 years ago
- Implementation of the maximum network flow problem in CUDA.☆32Updated 4 years ago