gmarciani / cudawesomeLinks
A collection of awesome algorithms, implemented in CUDA.
☆25Updated 7 years ago
Alternatives and similar repositories for cudawesome
Users that are interested in cudawesome are comparing it to the libraries listed below
Sorting:
- Algorithms implemented in CUDA + resources about GPGPU☆56Updated 3 years ago
- Template for starting CUDA/C++ project using CMake with Github Action for CI☆29Updated 2 years ago
- Code samples for the CUDA tutorial "CUDA and Applications to Task-based Programming"☆89Updated last year
- Next generation LAPACK implementation for ROCm platform☆103Updated this week
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆55Updated 3 months ago
- ☆60Updated 2 years ago
- ☆67Updated 11 years ago
- Next generation library for iterative sparse solvers for ROCm platform☆81Updated this week
- Examples for using SYCL on CUDA☆62Updated 2 weeks ago
- My notes on various HPC papers.☆22Updated 2 years ago
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆172Updated this week
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆84Updated last week
- Examples for HIP☆208Updated 6 months ago
- CUDA implementation of exclusive prefix sum via Blelloch's algorithm☆28Updated 7 years ago
- Next generation SPARSE implementation for ROCm platform☆127Updated this week
- AMD lab notes with code examples to demonstrate use of AMD GPUs☆97Updated 11 months ago
- rocWMMA☆115Updated last week
- Asynchronous Task and Memory Interface, or ATMI, is a runtime framework and programming model for heterogeneous CPU-GPU systems. It provi…☆68Updated last year
- ☆23Updated 3 years ago
- Subset of BLAS routines optimized for NVIDIA GPUs☆69Updated 2 years ago
- resources pour le cours d'introduction à la programmation des GPUs du mastère spécialisé HPC-AI☆21Updated last year
- NVIDIA HPCG is based on the HPCG benchmark and optimized for performance on NVIDIA accelerated HPC systems.☆56Updated 2 months ago
- OpenCL for Visual Studio Code☆43Updated last week
- AMD’s C++ library for accelerating tensor primitives☆42Updated this week
- portDNN is a library implementing neural network algorithms written using SYCL☆113Updated last year
- 🎃 GPU load-balancing library for regular and irregular computations.☆62Updated last year
- Multi-GPU Framework for Voxel Grid Computations☆57Updated this week
- CUDA Guide☆67Updated last year
- CUDA implementation of the fundamental sum reduce operation. Aims to be as optimized as reasonable.☆37Updated 7 years ago
- An extension library of WMMA API (Tensor Core API)☆99Updated 11 months ago