divyanshu-talwar / Parallel-DFS
CUDA implementation of parallel Depth First Search (DFS) algorithm and it's comparison with a serial C++ DFS implementation.
☆27Updated 6 years ago
Related projects ⓘ
Alternatives and complementary repositories for Parallel-DFS
- A Library for fast Hash Tables on GPUs☆109Updated 2 years ago
- Code for paper "Engineering a High-Performance GPU B-Tree" accepted to PPoPP 2019☆52Updated 2 years ago
- Concurrent CPU-GPU Programming using Task Models☆100Updated 4 years ago
- Runs a single CUDA/OpenCL kernel, taking its source from a file and arguments from the command-line☆18Updated this week
- a CUDA implementation of a priority queue☆81Updated 4 years ago
- A warp-oriented dynamic hash table for GPUs☆71Updated 10 months ago
- ☆31Updated 4 years ago
- CUDA kernel author's tools☆109Updated 2 years ago
- Directed Acyclic Graph Execution Engine (DAGEE) is a C++ library that enables programmers to express computation and data movement, as ta…☆44Updated 3 years ago
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆43Updated 10 months ago
- Algorithms implemented in CUDA + resources about GPGPU☆54Updated 2 years ago
- BGHT: High-performance static GPU hash tables.☆55Updated 2 months ago
- Some CUDA design patterns and a bit of template magic for CUDA☆146Updated last year
- Thrust, CUB, TBB, AVX2, CUDA, OpenCL, OpenMP, SyCL - all it takes to sum a lot of numbers fast!☆73Updated 6 months ago
- Home of ALP/GraphBLAS and ALP/Pregel, featuring shared- and distributed-memory auto-parallelisation of linear algebraic and vertex-centri…☆25Updated last week
- Source code examples from the Parallel Forall Blog☆94Updated 5 years ago
- Emulating DMA Engines on GPUs for Performance and Portability☆34Updated 9 years ago
- Implementation of parallel Breadth First Algorithm for graph traversal using CUDA and C++ language.☆31Updated 4 years ago
- Abstractions of memory, allocator, vector, tuple, shared_ptr, unique_ptr, bitset, variant and string working on both CPU and GPU☆30Updated last month
- Implementation of breadth first search on GPU with CUDA Driver API.☆46Updated 3 years ago
- ☆90Updated 7 years ago
- CUDA implementation of the fundamental sum reduce operation. Aims to be as optimized as reasonable.☆35Updated 7 years ago
- Parallel Tasking Library (PTL) - Lightweight C++11 mutilthreading tasking system featuring thread-pool, task-groups, and lock-free task q…☆43Updated last week
- CUDA Flux is a profiler for GPU applications which reports the basic block executions frequencies of compute kernels☆31Updated 3 years ago
- Stencil Probe - a stencil microbenchmark☆29Updated 11 years ago
- GPU B-Tree with support for versioning (snapshots).☆44Updated 3 weeks ago
- Generate simple index ranges in C++ and CUDA C++☆39Updated last year
- Learn OpenMP examples step by step☆86Updated 3 years ago
- Code samples for the CUDA tutorial "CUDA and Applications to Task-based Programming"☆82Updated last year
- A 128 bit unsigned integer class for CUDA☆43Updated 3 years ago