h3ct0rjs / HighPerformanceComputingLinks
Class of High Performance Computing taken at U.T.P 2017
β60Updated 7 years ago
Alternatives and similar repositories for HighPerformanceComputing
Users that are interested in HighPerformanceComputing are comparing it to the libraries listed below
Sorting:
- CUDA Matrix Multiplication Optimizationβ188Updated 10 months ago
- π A curated list of awesome matrix-matrix multiplication (A * B = C) frameworks, libraries and softwareβ35Updated 3 months ago
- Personal Notes for Learning HPC & Parallel Computation [Active Adding New Content]β67Updated 2 years ago
- Examples and exercises from the book Programming Massively Parallel Processors - A Hands-on Approach. David B. Kirk and Wen-mei W. Hwu (Tβ¦β67Updated 4 years ago
- Examples from Programming in Parallel with CUDAβ149Updated 2 years ago
- NVIDIA tools guideβ133Updated 4 months ago
- IMPACT GPU Algorithms Teaching Labsβ57Updated 2 years ago
- Training material for Nsight developer toolsβ157Updated 9 months ago
- Machine Learning Compiler Road Mapβ43Updated last year
- β158Updated 10 months ago
- Reference Kernels for the Leaderboardβ49Updated last week
- Efficient implementations of Merge Sort and Bitonic Sort algorithms using CUDA for GPU parallel processing, resulting in accelerated sortβ¦β15Updated last year
- πππ This repository lists some awesome public CUDA, cuda-python, cuBLAS, cuDNN, CUTLASS, TensorRT, TensorRT-LLM, Triton, TVM, MLIR, PTβ¦β275Updated this week
- Assembler and Decompiler for NVIDIA (Maxwell Pascal Volta Turing Ampere) GPUs.β79Updated 2 years ago
- CUDA Guideβ66Updated last year
- Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/Oβ364Updated 4 months ago
- Step-by-step optimization of CUDA SGEMMβ333Updated 3 years ago
- CUDA by Example, written by two senior members of the CUDA software platform team, shows programmers how to employ this new technology. β¦β419Updated last year
- Code samples for the CUDA tutorial "CUDA and Applications to Task-based Programming"β88Updated last year
- Examples of CUDA implementations by Cutlass CuTeβ188Updated 4 months ago
- An experimental CPU backend for Tritonβ119Updated this week
- β169Updated last year
- Matrix Multiply-Accumulate with CUDA and WMMA( Tensor Core)β134Updated 4 years ago
- Instructions, Docker images, and examples for Nsight Compute and Nsight Systemsβ131Updated 5 years ago
- A set of hands-on tutorials for CUDA programmingβ223Updated last year
- β‘οΈWrite HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peakβ‘οΈ Performance.β79Updated 3 weeks ago
- Solution of Programming Massively Parallel Processorsβ47Updated last year
- We invite you to visit and follow our new repository at https://github.com/microsoft/TileFusion. TiledCUDA is a highly efficient kernel β¦β182Updated 4 months ago
- A Easy-to-understand TensorOp Matmul Tutorialβ360Updated 8 months ago
- Main Book repository for the Parallel and High Performance Computing book, Manning Publicationsβ206Updated 3 years ago