rbaygildin / learn-gpgpu
Algorithms implemented in CUDA + resources about GPGPU
☆56Updated 3 years ago
Alternatives and similar repositories for learn-gpgpu:
Users that are interested in learn-gpgpu are comparing it to the libraries listed below
- A collection of awesome algorithms, implemented in CUDA.☆25Updated 7 years ago
- Code samples for the CUDA tutorial "CUDA and Applications to Task-based Programming"☆88Updated last year
- Learn OpenMP examples step by step☆92Updated 3 months ago
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆50Updated last month
- Learn OpenCL step by step.☆135Updated 2 years ago
- ☆44Updated 7 years ago
- Implementations of 2D Image Convolution algorithm with CUDA (using global memory, shared memory and constant memory)☆17Updated 7 years ago
- A curated list of awesome GPGPU (CUDA/OpenCL/Vulkan) resources☆93Updated 2 years ago
- CUDA by practice☆126Updated 5 years ago
- ☆23Updated 3 years ago
- Serial and parallel implementations of matrix multiplication☆40Updated 4 years ago
- BGHT: High-performance static GPU hash tables.☆63Updated 3 weeks ago
- ☆67Updated 11 years ago
- "Hardware, Software, and Compilers! Oh My!" tutorial files☆16Updated 5 years ago
- Examples for using SYCL on CUDA☆62Updated 2 months ago
- CUDA Guide☆64Updated last year
- portDNN is a library implementing neural network algorithms written using SYCL☆113Updated 11 months ago
- Some CUDA design patterns and a bit of template magic for CUDA☆150Updated last year
- Intel Data Parallel C++ (and SYCL 2020) Tutorial.☆93Updated 3 years ago
- CS344 - Introduction To Parallel Programming course (Udacity) proposed solutions☆53Updated 7 years ago
- A Collection of Articles and other OpenCL Papers☆57Updated 6 years ago
- THIS REPOSITORY HAS MOVED TO github.com/nvidia/cub, WHICH IS AUTOMATICALLY MIRRORED HERE.☆84Updated last year
- Concurrent CPU-GPU Programming using Task Models☆101Updated 5 years ago
- Source code examples from the Parallel Forall Blog☆96Updated 6 years ago
- Simple OpenCL Samples that Build with Khronos Headers and Libs☆101Updated this week
- An implementation of parallel exclusive scan in CUDA☆62Updated 7 years ago
- Learning and practice of high performance computing (CUDA, Vulkan, OpenCL, OpenMP, TBB, SSE/AVX, NEON, MPI, coroutines, etc. )☆60Updated last month
- Thrust, CUB, TBB, AVX2, AVX-512, CUDA, OpenCL, OpenMP, Metal - all it takes to sum a lot of numbers fast!☆96Updated this week
- A framework that support executing unmodified CUDA source code on non-NVIDIA devices.☆124Updated 4 months ago
- CNNs in Halide☆23Updated 9 years ago