coderonion / awesome-cuda-and-hpcLinks
๐๐๐ This repository lists some awesome public CUDA, cuda-python, cuBLAS, cuDNN, CUTLASS, TensorRT, TensorRT-LLM, Triton, TVM, MLIR, PTX and High Performance Computing (HPC) projects.
โ288Updated 3 weeks ago
Alternatives and similar repositories for awesome-cuda-and-hpc
Users that are interested in awesome-cuda-and-hpc are comparing it to the libraries listed below
Sorting:
- CSV spreadsheets and other material for AI accelerator survey papersโ171Updated last year
- โ145Updated last year
- This repo contains the Assignments from Cornell Tech's ECE 5545 - Machine Learning Hardware and Systems offered in Spring 2023โ32Updated 2 years ago
- A CUDA tutorial to make people learn CUDA program from 0โ234Updated 11 months ago
- Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instructโฆโ425Updated 9 months ago
- โ65Updated 5 months ago
- CUDA Matrix Multiplication Optimizationโ196Updated 11 months ago
- ONNXim is a fast cycle-level simulator that can model multi-core NPUs for DNN inferenceโ126Updated 4 months ago
- PyTorch emulation library for Microscaling (MX)-compatible data formatsโ251Updated last week
- Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.โ357Updated 5 months ago
- โ146Updated 6 months ago
- learning how CUDA worksโ271Updated 3 months ago
- A Easy-to-understand TensorOp Matmul Tutorialโ364Updated 9 months ago
- collection of benchmarks to measure basic GPU capabilitiesโ385Updated 4 months ago
- Examples of CUDA implementations by Cutlass CuTeโ197Updated 4 months ago
- A simple high performance CUDA GEMM implementation.โ382Updated last year
- A scalable High-Level Synthesis framework on MLIRโ261Updated last year
- ๅ ่ฟ็ผ่ฏๅฎ้ชๅฎค็ไธชไบบไธป้กตโ103Updated 2 months ago
- โ101Updated last year
- Repository to host and maintain scale-sim-v2 codeโ308Updated 2 months ago
- โ110Updated 3 weeks ago
- Allo: A Programming Model for Composable Accelerator Designโ240Updated this week
- Open, Modular, Deep Learning Acceleratorโ292Updated last year
- Xiao's CUDA Optimization Guide [NO LONGER ADDING NEW CONTENT]โ302Updated 2 years ago
- โ170Updated last year
- โ69Updated 8 months ago
- code reading for tvmโ76Updated 3 years ago
- This is the top-level repository for the Accel-Sim framework.โ432Updated last week
- โ100Updated last week
- โ135Updated last year