TravisWThompson1 / Makefile_Example_CUDA_CPP_To_Executable
Example Makefile for CUDA and C++ source files in a standard project layout.
☆48Updated 7 years ago
Alternatives and similar repositories for Makefile_Example_CUDA_CPP_To_Executable:
Users that are interested in Makefile_Example_CUDA_CPP_To_Executable are comparing it to the libraries listed below
- Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial☆214Updated 3 months ago
- Implementation and analysis of five different GPU based SPMV algorithms in CUDA☆38Updated 6 years ago
- NVIDIA tools guide☆104Updated last month
- rocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.☆56Updated last week
- CUDA Matrix Multiplication Optimization☆167Updated 7 months ago
- collection of benchmarks to measure basic GPU capabilities☆304Updated 3 weeks ago
- Examples demonstrating available options to program multiple GPUs in a single node or a cluster☆628Updated last week
- resources pour le cours d'introduction à la programmation des GPUs du mastère spécialisé HPC-AI☆22Updated last year
- Step-by-step optimization of CUDA SGEMM☆289Updated 2 years ago
- Instructions, Docker images, and examples for Nsight Compute and Nsight Systems☆130Updated 4 years ago
- Training materials associated with NVIDIA's CUDA Training Series (www.olcf.ornl.gov/cuda-training-series/)☆710Updated 6 months ago
- Distributed Communication-Optimal Matrix-Matrix Multiplication Algorithm☆198Updated 2 months ago
- 🎃 GPU load-balancing library for regular and irregular computations.☆62Updated 8 months ago
- Main Book repository for the Parallel and High Performance Computing book, Manning Publications☆196Updated 2 years ago
- ☆231Updated this week
- Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.☆326Updated 2 months ago
- NVIDIA HPCG is based on the HPCG benchmark and optimized for performance on NVIDIA accelerated HPC systems.☆50Updated last week
- Training material for Nsight developer tools☆149Updated 6 months ago
- Source code of the PPoPP '22 paper: "TileSpGEMM: A Tiled Algorithm for Parallel Sparse General Matrix-Matrix Multiplication on GPUs" by Y…☆39Updated 9 months ago
- ☆42Updated 4 years ago
- Matrix Multiply-Accumulate with CUDA and WMMA( Tensor Core)☆125Updated 4 years ago
- Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics (LULESH)☆105Updated last year
- Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruct…☆359Updated 5 months ago
- Stepwise optimizations of DGEMM on CPU, reaching performance faster than Intel MKL eventually, even under multithreading.☆130Updated 3 years ago
- ❤️ CUDA/C++ GPU graph analytics simplified.☆31Updated 2 years ago
- CUDA Kernel Benchmarking Library☆582Updated 3 months ago
- Efficient SpGEMM on GPU using CUDA and CSR☆52Updated last year
- Source code for the CPU-Free model - a fully autonomous execution model for multi-GPU applications that completely excludes the involveme…☆17Updated 10 months ago
- A simple high performance CUDA GEMM implementation.☆349Updated last year
- RAJA Performance Suite☆118Updated last week