divyanshu-talwar / Parallel-DFS
CUDA implementation of parallel Depth First Search (DFS) algorithm and it's comparison with a serial C++ DFS implementation.
☆26Updated 6 years ago
Related projects: ⓘ
- Implementation of parallel Breadth First Algorithm for graph traversal using CUDA and C++ language.☆30Updated 4 years ago
- Code for paper "Engineering a High-Performance GPU B-Tree" accepted to PPoPP 2019☆51Updated 2 years ago
- a CUDA implementation of a priority queue☆80Updated 4 years ago
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆39Updated 8 months ago
- Concurrent CPU-GPU Programming using Task Models☆99Updated 4 years ago
- A warp-oriented dynamic hash table for GPUs☆70Updated 8 months ago
- GPU B-Tree with support for versioning (snapshots).☆39Updated 5 months ago
- Runs a single CUDA/OpenCL kernel, taking its source from a file and arguments from the command-line☆18Updated 2 weeks ago
- Implementation of breadth first search on GPU with CUDA Driver API.☆46Updated 3 years ago
- My notes on various HPC papers.☆21Updated last year
- ☆26Updated 4 years ago
- A 128 bit unsigned integer class for CUDA☆42Updated 2 years ago
- Directed Acyclic Graph Execution Engine (DAGEE) is a C++ library that enables programmers to express computation and data movement, as ta…☆43Updated 2 years ago
- BGHT: High-performance static GPU hash tables.☆53Updated this week
- A Library for fast Hash Tables on GPUs☆108Updated 2 years ago
- Learn OpenMP examples step by step☆81Updated 3 years ago
- Boost.org graph_parallel module☆26Updated last month
- Scalable High-performance Algorithms and Data-structures☆122Updated 9 months ago
- A Toolkit for Programming Parallel Algorithms on Shared-Memory Multicore Machines☆308Updated 4 months ago
- Multi-GPU dynamic scheduler using PGAS style cross-GPU communication☆27Updated last year
- Abstractions of memory, allocator, vector, tuple, shared_ptr, unique_ptr, bitset, variant and string working on both CPU and GPU☆28Updated 3 weeks ago
- Parallel Graph Input Output☆17Updated last year
- Generate simple index ranges in C++ and CUDA C++☆38Updated last year
- 🎃 GPU load-balancing library for regular and irregular computations.☆56Updated 3 months ago
- IMPACT GPU Algorithms Teaching Labs☆55Updated last year
- LEMON Graph Library☆27Updated 4 years ago
- CUDA kernel author's tools☆105Updated 2 years ago
- A parallel implementation of DFS for Directed Acyclic Graphs (https://research.nvidia.com/publication/parallel-depth-first-search-directe…☆47Updated 3 years ago
- Unit benchmarks of CUDA event APIs.☆17Updated 4 months ago
- NUMA-aware multi-CPU multi-GPU data transfer benchmarks☆20Updated 10 months ago