coderonion / awesome-cuda-and-hpc
πππ This repository lists some awesome public CUDA, cuda-python, cuBLAS, cuDNN, CUTLASS, TensorRT, TensorRT-LLM, Triton, TVM, MLIR, PTX and High Performance Computing (HPC) projects.
β229Updated last week
Alternatives and similar repositories for awesome-cuda-and-hpc:
Users that are interested in awesome-cuda-and-hpc are comparing it to the libraries listed below
- CSV spreadsheets and other material for AI accelerator survey papersβ164Updated last year
- β145Updated 10 months ago
- A scalable High-Level Synthesis framework on MLIRβ254Updated 10 months ago
- CUDA Matrix Multiplication Optimizationβ177Updated 8 months ago
- PyTorch emulation library for Microscaling (MX)-compatible data formatsβ212Updated 6 months ago
- ONNXim is a fast cycle-level simulator that can model multi-core NPUs for DNN inferenceβ99Updated last month
- This repo contains the Assignments from Cornell Tech's ECE 5545 - Machine Learning Hardware and Systems offered in Spring 2023β27Updated last year
- An MLIR-based toolchain for AMD AI Engine-enabled devices.β353Updated this week
- β91Updated this week
- Hands-On Practical MLIR Tutorialβ20Updated 8 months ago
- π A curated list of awesome matrix-matrix multiplication (A * B = C) frameworks, libraries and softwareβ27Updated last month
- β94Updated last year
- β134Updated 3 months ago
- An open-source parameterizable NPU generator with full-stack multi-target compilation stack for intelligent workloads.β49Updated 2 weeks ago
- code reading for tvmβ76Updated 3 years ago
- AutoSA: Polyhedral-Based Systolic Array Compilerβ215Updated 2 years ago
- A tutorial for CUDA&PyTorchβ131Updated 2 months ago
- Allo: A Programming Model for Composable Accelerator Designβ217Updated this week
- β65Updated 5 months ago
- CUDA PTX-ISA Document δΈζηΏ»θ―ηβ37Updated 2 weeks ago
- This is the top-level repository for the Accel-Sim framework.β376Updated last week
- Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instructβ¦β374Updated 6 months ago
- LLVM OpenCL C compiler suite for ventus GPGPUβ43Updated 2 weeks ago
- PyTorch model to RTL flow for low latency inferenceβ126Updated last year
- Examples of CUDA implementations by Cutlass CuTeβ148Updated last month
- Open, Modular, Deep Learning Acceleratorβ283Updated 11 months ago
- β60Updated 2 months ago
- FRAME: Fast Roofline Analytical Modeling and Estimationβ34Updated last year
- collection of benchmarks to measure basic GPU capabilitiesβ340Updated last month
- A Easy-to-understand TensorOp Matmul Tutorialβ332Updated 6 months ago