pkestene / cuda-proj-tmpl
A minimal cmake based project skeleton for developping a CUDA application
☆16Updated 8 months ago
Related projects: ⓘ
- resources pour le cours d'introduction à la programmation des GPUs du mastère spécialisé HPC-AI☆22Updated 8 months ago
- Generate simple index ranges in C++ and CUDA C++☆38Updated last year
- BGHT: High-performance static GPU hash tables.☆53Updated this week
- Specialized Parallel Linear Algebra, providing distributed GEMM functionality for specific matrix distributions with optional GPU acceler…☆27Updated 2 months ago
- An implementation of parallel exclusive scan in CUDA☆57Updated 6 years ago
- Matrix Multiply-Accumulate with CUDA and WMMA( Tensor Core)☆109Updated 4 years ago
- CUDA implementation of the fundamental sum reduce operation. Aims to be as optimized as reasonable.☆35Updated 7 years ago
- Code samples for the CUDA tutorial "CUDA and Applications to Task-based Programming"☆78Updated last year
- Subset of BLAS routines optimized for NVIDIA GPUs☆63Updated last year
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆39Updated 8 months ago
- Samples demonstrating how to use the Compute Sanitizer Tools and Public API☆56Updated 10 months ago
- ☆21Updated 2 years ago
- Examples for using SYCL on CUDA☆59Updated 2 weeks ago
- NVIDIA HPCG is based on the HPCG benchmark and optimized for performance on NVIDIA accelerated HPC systems.☆41Updated 3 weeks ago
- Efficient SpGEMM on GPU using CUDA and CSR☆50Updated last year
- High-performance, GPU-aware communication library☆85Updated last month
- Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial☆165Updated 3 months ago
- Distributed Communication-Optimal LU-factorization Algorithm☆12Updated 3 years ago
- ☆26Updated 4 years ago
- Online CUDA Occupancy Calculator☆65Updated 2 years ago
- ☆30Updated 3 years ago
- An extension library of WMMA API (Tensor Core API)☆81Updated 2 months ago
- Instructions, Docker images, and examples for Nsight Compute and Nsight Systems☆126Updated 4 years ago
- Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.☆21Updated last week
- 🎃 GPU load-balancing library for regular and irregular computations.☆56Updated 3 months ago
- CUDA kernel author's tools☆105Updated 2 years ago
- Fast and full-featured Matrix Market I/O library for C++, Python, and R☆73Updated last month
- Template for starting CUDA/C++ project using CMake with Github Action for CI☆29Updated last year
- Intel Data Parallel C++ (and SYCL 2020) Tutorial.☆90Updated 2 years ago
- A Visual Studio Code extension for building and debugging CUDA applications.☆68Updated last month