JGU-HPC / parallelprogrammingbookLinks
supplementary material/programming exercises
☆73Updated 3 years ago
Alternatives and similar repositories for parallelprogrammingbook
Users that are interested in parallelprogrammingbook are comparing it to the libraries listed below
Sorting:
- BGHT: High-performance static GPU hash tables.☆71Updated 2 months ago
- Source code examples from the Parallel Forall Blog☆96Updated 6 years ago
- Code samples for the CUDA tutorial "CUDA and Applications to Task-based Programming"☆92Updated 2 years ago
- A warp-oriented dynamic hash table for GPUs☆74Updated last year
- A Library for fast Hash Tables on GPUs☆126Updated 3 years ago
- Some CUDA design patterns and a bit of template magic for CUDA☆156Updated 2 years ago
- Learn OpenMP examples step by step☆96Updated 8 months ago
- ☆47Updated 5 years ago
- Intel Data Parallel C++ (and SYCL 2020) Tutorial.☆95Updated 3 years ago
- Examples for using SYCL on CUDA☆62Updated 2 weeks ago
- Fast and full-featured Matrix Market I/O library for C++, Python, and R☆82Updated last year
- Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics (LULESH)☆109Updated 2 years ago
- Main Book repository for the Parallel and High Performance Computing book, Manning Publications☆213Updated 3 years ago
- Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial☆298Updated 2 weeks ago
- tools to create performance and roofline plots from measured data☆59Updated 11 years ago
- Subset of BLAS routines optimized for NVIDIA GPUs☆72Updated 2 years ago
- Examples from Programming in Parallel with CUDA☆161Updated 2 years ago
- Distributed Communication-Optimal Matrix-Matrix Multiplication Algorithm☆209Updated 4 months ago
- Online CUDA Occupancy Calculator☆80Updated 3 years ago
- Kernel Tuning Toolkit☆64Updated 2 months ago
- General Purpose Timing Library☆34Updated last month
- Source code for 'Data Parallel C++: Mastering DPC++ for Programming of Heterogeneous Systems using C++ and SYCL' by James Reinders, Ben A…☆277Updated 5 months ago
- Distributed Communication-Optimal LU-factorization Algorithm☆12Updated 4 years ago
- A unified framework across multiple programming platforms☆41Updated 3 months ago
- Generate simple index ranges in C++ and CUDA C++☆39Updated 2 years ago
- Full-speed Array of Structures access☆173Updated 2 years ago
- NUMA-aware multi-CPU multi-GPU data transfer benchmarks☆24Updated last year
- STREAM, for lots of devices written in many programming models☆350Updated last week
- Efficient SpGEMM on GPU using CUDA and CSR☆57Updated 2 years ago
- CSR-based SpGEMM on nVidia and AMD GPUs☆46Updated 9 years ago