JGU-HPC / parallelprogrammingbook
supplementary material/programming exercises
☆71Updated 3 years ago
Related projects: ⓘ
- ☆25Updated 4 years ago
- Efficient SpGEMM on GPU using CUDA and CSR☆50Updated last year
- Implementation and analysis of five different GPU based SPMV algorithms in CUDA☆33Updated 5 years ago
- Intel Data Parallel C++ (and SYCL 2020) Tutorial.☆90Updated 2 years ago
- ☆39Updated 4 years ago
- NUMA-aware multi-CPU multi-GPU data transfer benchmarks☆20Updated 10 months ago
- A Library for fast Hash Tables on GPUs☆108Updated 2 years ago
- Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial☆165Updated 3 months ago
- ☆88Updated 7 years ago
- Matrix Multiply-Accumulate with CUDA and WMMA( Tensor Core)☆109Updated 4 years ago
- Samples demonstrating how to use the Compute Sanitizer Tools and Public API☆56Updated 10 months ago
- 🎃 GPU load-balancing library for regular and irregular computations.☆56Updated 3 months ago
- General Purpose Timing Library☆31Updated 4 months ago
- High-performance, GPU-aware communication library☆85Updated last month
- CSR5-based SpMV on CPUs, GPUs and Xeon Phi☆93Updated 3 months ago
- Online CUDA Occupancy Calculator☆65Updated 2 years ago
- Main Book repository for the Parallel and High Performance Computing book, Manning Publications☆168Updated 2 years ago
- tools to create performance and roofline plots from measured data☆57Updated 10 years ago
- TLB Benchmarks☆32Updated 7 years ago
- GPU Performance Advisor☆58Updated 2 years ago
- A warp-oriented dynamic hash table for GPUs☆70Updated 8 months ago
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆96Updated 7 years ago
- Tartan: Evaluating Modern GPU Interconnect via a Multi-GPU Benchmark Suite☆57Updated 6 years ago
- Generate simple index ranges in C++ and CUDA C++☆38Updated last year
- NCCL Examples from Official NVIDIA NCCL Developer Guide.☆11Updated 6 years ago
- Prototype of OpenSHMEM for NVIDIA GPUs, developed as part of DoE Design Forward☆21Updated 6 years ago
- Sample examples of how to call collective operation functions on multi-GPU environments. A simple example of using broadcast, reduce, all…☆22Updated last year
- Examples from Programming in Parallel with CUDA☆101Updated last year
- Full-speed Array of Structures access☆155Updated last year
- Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.☆21Updated last week