pkestene / cuda-proj-tmplLinks
A minimal cmake based project skeleton for developping a CUDA application
☆17Updated last year
Alternatives and similar repositories for cuda-proj-tmpl
Users that are interested in cuda-proj-tmpl are comparing it to the libraries listed below
Sorting:
- resources pour le cours d'introduction à la programmation des GPUs du mastère spécialisé HPC-AI☆22Updated last year
- Specialized Parallel Linear Algebra, providing distributed GEMM functionality for specific matrix distributions with optional GPU acceler…☆29Updated 11 months ago
- CUDA implementation of the fundamental sum reduce operation. Aims to be as optimized as reasonable.☆37Updated 7 years ago
- Subset of BLAS routines optimized for NVIDIA GPUs☆68Updated 2 years ago
- Generate simple index ranges in C++ and CUDA C++☆39Updated last year
- ☆23Updated 3 years ago
- Code samples for the CUDA tutorial "CUDA and Applications to Task-based Programming"☆88Updated last year
- Numbast is a tool to build an automated pipeline that converts CUDA APIs into Numba bindings.☆47Updated last week
- BGHT: High-performance static GPU hash tables.☆65Updated 2 months ago
- Intermediate MPI lesson☆28Updated 2 years ago
- Samples demonstrating how to use the Compute Sanitizer Tools and Public API☆82Updated last year
- Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.☆32Updated 2 months ago
- Distributed Communication-Optimal LU-factorization Algorithm☆12Updated 3 years ago
- ☆18Updated 5 years ago
- Template for starting CUDA/C++ project using CMake with Github Action for CI☆29Updated 2 years ago
- Efficient SpGEMM on GPU using CUDA and CSR☆54Updated last year
- ☆40Updated 4 years ago
- Reusable software components for ROCm developers☆84Updated this week
- Next generation library for iterative sparse solvers for ROCm platform☆81Updated last week
- A Library for fast Hash Tables on GPUs☆119Updated 2 years ago
- cuASR: CUDA Algebra for Semirings☆35Updated 2 years ago
- Source code examples from the Parallel Forall Blog☆96Updated 6 years ago
- Instructions, Docker images, and examples for Nsight Compute and Nsight Systems☆131Updated 5 years ago
- An implementation of parallel exclusive scan in CUDA☆62Updated 7 years ago
- A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators☆95Updated 2 weeks ago
- Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial☆268Updated this week
- Easier, quicker command-line CUDA profiling☆12Updated 8 months ago
- Some CUDA design patterns and a bit of template magic for CUDA☆154Updated 2 years ago
- MiniAMR Adaptive Mesh Refinement (AMR) Mini-App☆34Updated 6 months ago
- 🎃 GPU load-balancing library for regular and irregular computations.☆62Updated 11 months ago