rbaygildin / learn-gpgpu
Algorithms implemented in CUDA + resources about GPGPU
☆54Updated 2 years ago
Related projects ⓘ
Alternatives and complementary repositories for learn-gpgpu
- A collection of awesome algorithms, implemented in CUDA.☆24Updated 6 years ago
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆43Updated 10 months ago
- Learn OpenMP examples step by step☆87Updated 3 years ago
- ☆64Updated 10 years ago
- Learn OpenCL step by step.☆132Updated 2 years ago
- Code samples for the CUDA tutorial "CUDA and Applications to Task-based Programming"☆82Updated last year
- OpenCL Tutorials☆47Updated 4 years ago
- portDNN is a library implementing neural network algorithms written using SYCL☆108Updated 6 months ago
- Template for starting CUDA/C++ project using CMake with Github Action for CI☆29Updated last year
- ☆42Updated 6 years ago
- "Hardware, Software, and Compilers! Oh My!" tutorial files☆17Updated 4 years ago
- ☆23Updated 2 years ago
- Some CUDA design patterns and a bit of template magic for CUDA☆146Updated last year
- Thrust, CUB, TBB, AVX2, CUDA, OpenCL, OpenMP, SyCL - all it takes to sum a lot of numbers fast!☆73Updated 6 months ago
- CS344 - Introduction To Parallel Programming course (Udacity) proposed solutions☆51Updated 7 years ago
- Examples from Programming in Parallel with CUDA☆108Updated last year
- CUDA kernel author's tools☆109Updated 2 years ago
- Examples for using SYCL on CUDA☆60Updated 2 weeks ago
- AMD ROCm Performance Primitives (RPP) library is a comprehensive high-performance computer vision library for AMD processors with HIP/Ope…☆55Updated this week
- RTX compute samples☆69Updated last year
- BGHT: High-performance static GPU hash tables.☆55Updated 2 months ago
- CUDA accelerated medical imaging algorithms☆13Updated 2 years ago
- SYCL for Vitis: Experimental fusion of triSYCL with Intel SYCL oneAPI DPC++ up-streaming effort into Clang/LLVM☆111Updated 2 weeks ago
- Concurrent CPU-GPU Programming using Task Models☆100Updated 4 years ago
- Examples for HIP☆200Updated 2 weeks ago
- ☆37Updated 3 years ago
- CUDA implementation of exclusive prefix sum via Blelloch's algorithm☆25Updated 7 years ago
- Source code examples from the Parallel Forall Blog☆94Updated 5 years ago
- Generate simple index ranges in C++ and CUDA C++☆39Updated last year