ysh329 / OpenMP-101
Learn OpenMP examples step by step
☆90Updated last month
Alternatives and similar repositories for OpenMP-101:
Users that are interested in OpenMP-101 are comparing it to the libraries listed below
- Code samples for the CUDA tutorial "CUDA and Applications to Task-based Programming"☆89Updated last year
- ☆20Updated 8 years ago
- Serial and parallel implementations of matrix multiplication☆39Updated 4 years ago
- "Hardware, Software, and Compilers! Oh My!" tutorial files☆16Updated 5 years ago
- Specialized Parallel Linear Algebra, providing distributed GEMM functionality for specific matrix distributions with optional GPU acceler…☆27Updated 7 months ago
- Algorithms implemented in CUDA + resources about GPGPU☆54Updated 3 years ago
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆50Updated last year
- Exercises and Solutions for "Programming Your GPU with OpenMP: A Hands-On Introduction"☆131Updated 3 months ago
- My notes on various HPC papers.☆21Updated 2 years ago
- Main Book repository for the Parallel and High Performance Computing book, Manning Publications☆192Updated 2 years ago
- ☆86Updated 2 years ago
- Generate simple index ranges in C++ and CUDA C++☆39Updated last year
- Examples from Programming in Parallel with CUDA☆122Updated last year
- Examples for using SYCL on CUDA☆60Updated 2 weeks ago
- ☆32Updated 4 years ago
- NVIDIA tools guide☆102Updated last month
- NVIDIA HPCG is based on the HPCG benchmark and optimized for performance on NVIDIA accelerated HPC systems.☆48Updated 4 months ago
- Implement Neural Networks in Cuda from Scratch☆21Updated 9 months ago
- Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.☆30Updated 2 months ago
- Samples demonstrating how to use the Compute Sanitizer Tools and Public API☆75Updated last year
- AMD’s C++ library for accelerating tensor primitives☆38Updated this week
- NUMA-aware multi-CPU multi-GPU data transfer benchmarks☆21Updated last year
- Intermediate MPI lesson☆26Updated last year
- Examples from the "C++ From Scratch" Series☆70Updated 2 years ago
- Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial☆209Updated 2 months ago
- IMPACT GPU Algorithms Teaching Labs☆56Updated last year
- 🎃 GPU load-balancing library for regular and irregular computations.☆60Updated 8 months ago
- Learn OpenCL step by step.☆133Updated 2 years ago
- Learning and practice of high performance computing (CUDA, Vulkan, OpenCL, OpenMP, TBB, SSE/AVX, NEON, MPI, coroutines, etc. )☆59Updated this week
- A collection of awesome algorithms, implemented in CUDA.☆24Updated 7 years ago