ntuhpc / training-ay1819Links
sample code/text used in NTU HPC Internal Training during AY2018-2019
☆24Updated 6 years ago
Alternatives and similar repositories for training-ay1819
Users that are interested in training-ay1819 are comparing it to the libraries listed below
Sorting:
- Seminar on selected tools in Computer Science☆25Updated 4 years ago
- IMPACT GPU Algorithms Teaching Labs☆58Updated 2 years ago
- An implementation of HPL-AI Mixed-Precision Benchmark based on hpl-2.3☆28Updated 4 years ago
- A hybrid partitioner based quantum circuit simulation system on GPU☆47Updated 3 years ago
- My paper/code reading notes in Chinese☆46Updated 4 months ago
- Instructions, Docker images, and examples for Nsight Compute and Nsight Systems☆133Updated 5 years ago
- An efficient concurrent graph processing system☆46Updated 3 years ago
- Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics (LULESH)☆111Updated 2 years ago
- ☆29Updated 5 years ago
- Learn OpenMP examples step by step☆97Updated 8 months ago
- Slides about how to do research☆74Updated 8 months ago
- matrix multiplication in CUDA☆123Updated 2 years ago
- Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial☆305Updated last month
- Repository holding the code base to AC-SpGEMM : "Adaptive Sparse Matrix-Matrix Multiplication on the GPU"☆29Updated 5 years ago
- Efficient SpGEMM on GPU using CUDA and CSR☆57Updated 2 years ago
- Online CUDA Occupancy Calculator☆80Updated 4 years ago
- A GPU FP32 computation method with Tensor Cores.☆21Updated 2 years ago
- Distributed Communication-Optimal Matrix-Matrix Multiplication Algorithm☆209Updated 5 months ago
- PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections☆122Updated 3 years ago
- Some example MPI programs☆99Updated 14 years ago
- CUDA Flux is a profiler for GPU applications which reports the basic block executions frequencies of compute kernels☆32Updated 4 years ago
- A Thread-Level Synchronization-Free Sparse Triangular Solve on GPUs☆55Updated 4 years ago
- parser script to process pytorch autograd profiler result, convert json file to excel.☆15Updated 6 years ago
- CUDA by practice☆130Updated 5 years ago
- Training material for Nsight developer tools☆166Updated last year
- Some source code about matrix multiplication implementation on CUDA☆34Updated 7 years ago
- Exercises and Solutions for "Programming Your GPU with OpenMP: A Hands-On Introduction"☆147Updated 6 months ago
- ☆111Updated 4 years ago
- ☆23Updated 2 years ago
- Code for paper "Design Principles for Sparse Matrix Multiplication on the GPU" accepted to Euro-Par 2018☆73Updated 5 years ago