pkestene / cuda-proj-tmpl
A minimal cmake based project skeleton for developping a CUDA application
☆15Updated last year
Alternatives and similar repositories for cuda-proj-tmpl:
Users that are interested in cuda-proj-tmpl are comparing it to the libraries listed below
- resources pour le cours d'introduction à la programmation des GPUs du mastère spécialisé HPC-AI☆22Updated last year
- Specialized Parallel Linear Algebra, providing distributed GEMM functionality for specific matrix distributions with optional GPU acceler…☆27Updated 7 months ago
- BGHT: High-performance static GPU hash tables.☆57Updated 4 months ago
- ☆11Updated 5 years ago
- Generate simple index ranges in C++ and CUDA C++☆39Updated last year
- ☆37Updated 3 years ago
- MiniAMR Adaptive Mesh Refinement (AMR) Mini-App☆33Updated 2 months ago
- ☆16Updated 5 years ago
- Code samples for the CUDA tutorial "CUDA and Applications to Task-based Programming"☆88Updated last year
- Highly Efficient FFT for Exascale☆36Updated 9 months ago
- Template for starting CUDA/C++ project using CMake with Github Action for CI☆29Updated 2 years ago
- Distributed View Extension for Kokkos☆43Updated last month
- GTensor is a multi-dimensional array C++14 header-only library for hybrid GPU development.☆35Updated 4 months ago
- Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.☆30Updated last month
- Instructions, Docker images, and examples for Nsight Compute and Nsight Systems☆130Updated 4 years ago
- Subset of BLAS routines optimized for NVIDIA GPUs☆67Updated last year
- 🎃 GPU load-balancing library for regular and irregular computations.☆59Updated 7 months ago
- MagmaDNN: a simple deep learning framework in c++☆49Updated 4 years ago
- An extension library of WMMA API (Tensor Core API)☆87Updated 6 months ago
- NPBench - A Benchmarking Suite for High-Performance NumPy☆76Updated 2 months ago
- Local and distributed octrees based on Morton codes with halo discovery and exchange with a 3D collision detection algorithm☆41Updated this week
- My notes on various HPC papers.☆21Updated 2 years ago
- A Visual Studio Code extension for building and debugging CUDA applications.☆71Updated 6 months ago
- High-performance, GPU-aware communication library☆84Updated 3 weeks ago
- Examples for using SYCL on CUDA☆60Updated 3 weeks ago
- Distributed Communication-Optimal LU-factorization Algorithm☆12Updated 3 years ago
- NVIDIA HPCG is based on the HPCG benchmark and optimized for performance on NVIDIA accelerated HPC systems.☆46Updated 3 months ago
- ☆23Updated 2 years ago
- Some CUDA design patterns and a bit of template magic for CUDA☆148Updated last year
- SLATE is a distributed, GPU-accelerated, dense linear algebra library targetting current and upcoming high-performance computing (HPC) sy…☆104Updated 2 weeks ago