coderonion / awesome-cuda-and-hpcLinks
πππ This repository lists some awesome public CUDA, cuda-python, cuBLAS, cuDNN, CUTLASS, TensorRT, TensorRT-LLM, Triton, TVM, MLIR, PTX and High Performance Computing (HPC) projects.
β275Updated last week
Alternatives and similar repositories for awesome-cuda-and-hpc
Users that are interested in awesome-cuda-and-hpc are comparing it to the libraries listed below
Sorting:
- CSV spreadsheets and other material for AI accelerator survey papersβ169Updated last year
- This repo contains the Assignments from Cornell Tech's ECE 5545 - Machine Learning Hardware and Systems offered in Spring 2023β31Updated 2 years ago
- PyTorch emulation library for Microscaling (MX)-compatible data formatsβ241Updated last week
- ONNXim is a fast cycle-level simulator that can model multi-core NPUs for DNN inferenceβ120Updated 3 months ago
- Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.β355Updated 5 months ago
- CUDA Matrix Multiplication Optimizationβ189Updated 10 months ago
- Allo: A Programming Model for Composable Accelerator Designβ235Updated last week
- code reading for tvmβ76Updated 3 years ago
- π A curated list of awesome matrix-matrix multiplication (A * B = C) frameworks, libraries and softwareβ35Updated 3 months ago
- β144Updated 5 months ago
- β148Updated 11 months ago
- β145Updated last year
- Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instructβ¦β416Updated 9 months ago
- A Easy-to-understand TensorOp Matmul Tutorialβ360Updated 8 months ago
- Hands-On Practical MLIR Tutorialβ25Updated 10 months ago
- CUDA PTX-ISA Document δΈζηΏ»θ―ηβ42Updated last week
- Open, Modular, Deep Learning Acceleratorβ288Updated last year
- A scalable High-Level Synthesis framework on MLIRβ259Updated last year
- A simple high performance CUDA GEMM implementation.β374Updated last year
- collection of benchmarks to measure basic GPU capabilitiesβ377Updated 3 months ago
- Magicube is a high-performance library for quantized sparse matrix operations (SpMM and SDDMM) of deep learning on Tensor Cores.β88Updated 2 years ago
- β64Updated 5 months ago
- β98Updated last year
- Examples of CUDA implementations by Cutlass CuTeβ190Updated 4 months ago
- β100Updated this week
- β108Updated last week
- β22Updated 2 months ago
- β11Updated 8 months ago
- An MLIR-based toolchain for AMD AI Engine-enabled devices.β408Updated this week
- Programming and Assignment Material for ECE 695β15Updated 4 years ago