MTB90 / cuda-floyd_warshall
CUDA implementation of the Blocked Floyd Warshall All pairs shortest path graph algorithm
☆37Updated 6 years ago
Related projects: ⓘ
- Implementation of breadth first search on GPU with CUDA Driver API.☆46Updated 3 years ago
- ☆88Updated 7 years ago
- A warp-oriented dynamic hash table for GPUs☆70Updated 8 months ago
- An implementation of the revised simplex algorithm in CUDA for solving linear optimization problems in the form max{c*x | A*x=b, l<=x<=u}☆27Updated 7 years ago
- Asynchronous Multi-GPU Programming Framework☆45Updated 3 years ago
- A Distributed Multi-GPU System for Fast Graph Processing☆63Updated 5 years ago
- Medusa: Building GPU-based Parallel Sparse Graph Applications with Sequential C/C++ Code☆61Updated 3 years ago
- A Library for fast Hash Tables on GPUs☆108Updated 2 years ago
- Full-speed Array of Structures access☆155Updated last year
- CUDA Tensor Transpose (cuTT) library☆49Updated 7 years ago
- ❤️ CUDA/C++ GPU graph analytics simplified.☆30Updated 2 years ago
- Some CUDA design patterns and a bit of template magic for CUDA☆144Updated last year
- Sparse matrix computation library for GPU☆54Updated 4 years ago
- Code for paper "Design Principles for Sparse Matrix Multiplication on the GPU" accepted to Euro-Par 2018☆70Updated 3 years ago
- CUSP : A C++ Templated Sparse Matrix Library☆400Updated 8 months ago
- Code for paper "Engineering a High-Performance GPU B-Tree" accepted to PPoPP 2019☆51Updated 2 years ago
- ☆26Updated 4 years ago
- CUDA kernel author's tools☆105Updated 2 years ago
- CUDA implementation of exclusive prefix sum via Blelloch's algorithm☆25Updated 7 years ago
- BGHT: High-performance static GPU hash tables.☆53Updated this week
- Hornet data structure for sparse dynamic graphs and matrices☆78Updated 4 years ago
- a CUDA implementation of a priority queue☆80Updated 4 years ago
- sparse matrix pre-processing library☆81Updated 4 months ago
- gossip: Efficient Communication Primitives for Multi-GPU Systems☆58Updated 2 years ago
- A cross-platform CUDA/C++17 starter project with google test and google benchmark support.☆35Updated last year
- A new QR decomposition algorithm implemented in CUDA☆15Updated 2 months ago
- Sparse Matrix-Matrix Multiplication Benchmark on Intel Xeon and Xeon Phi (KNC, KNL) from blog post:☆12Updated 7 years ago
- CUDA implementation of the fundamental sum reduce operation. Aims to be as optimized as reasonable.☆35Updated 7 years ago
- CuSha is a CUDA-based vertex-centric graph processing framework that uses G-Shards and CW representations.☆52Updated 8 years ago
- Concurrent CPU-GPU Programming using Task Models☆99Updated 4 years ago