digital-nomad-cheng / ECE408_Applied_Parallel_Programming
CUDA solutions for the lab assignments in the UIUC-ECE408 Applied Parallel Programming course.
☆13Updated last year
Alternatives and similar repositories for ECE408_Applied_Parallel_Programming:
Users that are interested in ECE408_Applied_Parallel_Programming are comparing it to the libraries listed below
- CUTLASS and CuTe Examples☆43Updated 3 months ago
- Matrix Multiply-Accumulate with CUDA and WMMA( Tensor Core)☆127Updated 4 years ago
- Magicube is a high-performance library for quantized sparse matrix operations (SpMM and SDDMM) of deep learning on Tensor Cores.☆85Updated 2 years ago
- ☆29Updated 9 months ago
- ☆92Updated 11 months ago
- Solution of Programming Massively Parallel Processors☆42Updated last year
- CUDA Matrix Multiplication Optimization☆177Updated 8 months ago
- ☆46Updated last year
- Source code of the PPoPP '22 paper: "TileSpGEMM: A Tiled Algorithm for Parallel Sparse General Matrix-Matrix Multiplication on GPUs" by Y…☆39Updated 10 months ago
- ☆135Updated 3 months ago
- Dissecting NVIDIA GPU Architecture☆90Updated 2 years ago
- Examples and exercises from the book Programming Massively Parallel Processors - A Hands-on Approach. David B. Kirk and Wen-mei W. Hwu (T…☆66Updated 4 years ago
- ☆31Updated 2 years ago
- ☆42Updated 11 months ago
- Examples of CUDA implementations by Cutlass CuTe☆150Updated 2 months ago
- Artifacts of EVT ASPLOS'24☆23Updated last year
- 🎃 GPU load-balancing library for regular and irregular computations.☆62Updated 9 months ago
- Artifact for PPoPP22 QGTC: Accelerating Quantized GNN via GPU Tensor Core.☆27Updated 3 years ago
- ☆49Updated 5 years ago
- ☆100Updated 3 weeks ago
- An extension library of WMMA API (Tensor Core API)☆93Updated 8 months ago
- TileFlow is a performance analysis tool based on Timeloop for fusion dataflows☆58Updated 11 months ago
- A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores☆50Updated last year
- Automatic Mapping Generation, Verification, and Exploration for ISA-based Spatial Accelerators☆107Updated 2 years ago
- Implement Flash Attention using Cute.☆74Updated 3 months ago
- ☆112Updated 3 months ago
- An extention of TVMScript to write simple and high performance GPU kernels with tensorcore.☆51Updated 8 months ago
- ☆43Updated 4 years ago
- Triton to TVM transpiler.☆19Updated 5 months ago
- ☆90Updated 3 weeks ago