suco-gt / HPC-Internships
Supercomputing @ GT has compiled a list of organizations that offer internships and experiences in HPC and applications of HPC.
☆49Updated 9 months ago
Related projects: ⓘ
- Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial☆165Updated 3 months ago
- Example Makefile for CUDA and C++ source files in a standard project layout.☆46Updated 6 years ago
- Instructions, Docker images, and examples for Nsight Compute and Nsight Systems☆126Updated 4 years ago
- Source code of the PPoPP '22 paper: "TileSpGEMM: A Tiled Algorithm for Parallel Sparse General Matrix-Matrix Multiplication on GPUs" by Y…☆35Updated 3 months ago
- Training materials associated with NVIDIA's CUDA Training Series (www.olcf.ornl.gov/cuda-training-series/)☆545Updated last month
- collection of benchmarks to measure basic GPU capabilities☆241Updated 3 months ago
- Fast CUDA matrix multiplication from scratch☆423Updated 8 months ago
- Solution of Programming Massively Parallel Processors☆29Updated 8 months ago
- CUDA Matrix Multiplication Optimization☆118Updated 2 months ago
- Step-by-step optimization of CUDA SGEMM☆207Updated 2 years ago
- NVIDIA HPCG is based on the HPCG benchmark and optimized for performance on NVIDIA accelerated HPC systems.☆41Updated 3 weeks ago
- Stepwise optimizations of DGEMM on CPU, reaching performance faster than Intel MKL eventually, even under multithreading.☆103Updated 2 years ago
- A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores☆31Updated 9 months ago
- N-Ways to Multi-GPU Programming☆13Updated last year
- C++ package to store Matrix Market (.mtx) file format sparse matrices in Compressed Row Storage (CSR) format.☆11Updated 4 years ago
- ☆47Updated 9 months ago
- Implementation and analysis of five different GPU based SPMV algorithms in CUDA☆33Updated 5 years ago
- IMPACT GPU Algorithms Teaching Labs☆55Updated last year
- The repository holds the exercises and solutions for my online OpenMP tutorial series☆119Updated 3 years ago
- Advanced Matrix Extensions (AMX) Guide☆66Updated 2 years ago
- Sample examples of how to call collective operation functions on multi-GPU environments. A simple example of using broadcast, reduce, all…☆22Updated last year
- Tartan: Evaluating Modern GPU Interconnect via a Multi-GPU Benchmark Suite☆57Updated 6 years ago
- Mirror of http://gitlab.hpcrl.cse.ohio-state.edu/chong/ppopp19_ae, refactoring for understanding☆12Updated 2 years ago
- Magicube is a high-performance library for quantized sparse matrix operations (SpMM and SDDMM) of deep learning on Tensor Cores.☆79Updated last year
- Performance Prediction Toolkit for GPUs☆28Updated 2 years ago
- SparseP is the first open-source Sparse Matrix Vector Multiplication (SpMV) software package for real-world Processing-In-Memory (PIM) ar…☆71Updated 2 years ago
- Some source code about matrix multiplication implementation on CUDA☆35Updated 6 years ago
- Source code of the paper "OpSparse: a Highly Optimized Framework for Sparse General Matrix Multiplication on GPUs"☆11Updated 2 years ago
- Matrix Multiply-Accumulate with CUDA and WMMA( Tensor Core)☆109Updated 4 years ago
- Distributed Communication-Optimal Matrix-Matrix Multiplication Algorithm☆187Updated 2 months ago