digital-nomad-cheng / ECE408_Applied_Parallel_ProgrammingLinks
CUDA solutions for the lab assignments in the UIUC-ECE408 Applied Parallel Programming course.
☆17Updated 2 years ago
Alternatives and similar repositories for ECE408_Applied_Parallel_Programming
Users that are interested in ECE408_Applied_Parallel_Programming are comparing it to the libraries listed below
Sorting:
- A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores☆55Updated 2 years ago
- ☆41Updated last year
- tutorials about polyhedral compilation.☆58Updated last month
- CUDA Matrix Multiplication Optimization☆245Updated last year
- Matrix Multiply-Accumulate with CUDA and WMMA( Tensor Core)☆146Updated 5 years ago
- Magicube is a high-performance library for quantized sparse matrix operations (SpMM and SDDMM) of deep learning on Tensor Cores.☆90Updated 3 years ago
- ☆41Updated last year
- Automatic Mapping Generation, Verification, and Exploration for ISA-based Spatial Accelerators☆118Updated 3 years ago
- ☆50Updated 6 years ago
- ☆109Updated last year
- ☆141Updated this week
- Dissecting NVIDIA GPU Architecture☆114Updated 3 years ago
- 📚 A curated list of awesome matrix-matrix multiplication (A * B = C) frameworks, libraries and software☆60Updated 9 months ago
- Source code of the PPoPP '22 paper: "TileSpGEMM: A Tiled Algorithm for Parallel Sparse General Matrix-Matrix Multiplication on GPUs" by Y…☆44Updated last year
- TileFlow is a performance analysis tool based on Timeloop for fusion dataflows☆65Updated last year
- [HPCA 2026] A GPU-optimized system for efficient long-context LLMs decoding with low-bit KV cache.☆69Updated last week
- Optimize GEMM with tensorcore step by step☆36Updated 2 years ago
- Performance Prediction Toolkit for GPUs☆39Updated 3 years ago
- Artifacts of EVT ASPLOS'24☆28Updated last year
- WaferLLM: Large Language Model Inference at Wafer Scale☆77Updated last month
- GPU Performance Advisor☆65Updated 3 years ago
- NVIDIA cuTile learn☆119Updated last week
- ☆212Updated last month
- An extension library of WMMA API (Tensor Core API)☆109Updated last year
- A language and compiler for irregular tensor programs.☆152Updated last year
- ASPLOS'24: Optimal Kernel Orchestration for Tensor Programs with Korch☆39Updated 8 months ago
- ☆23Updated 8 months ago
- MAGIS: Memory Optimization via Coordinated Graph Transformation and Scheduling for DNN (ASPLOS'24)☆55Updated last year
- CUTLASS and CuTe Examples☆112Updated 2 weeks ago
- ☆13Updated 3 years ago