lcpu-club / awesome-rocmLinks
Collections and tutorials for ROCm
โ25Updated last week
Alternatives and similar repositories for awesome-rocm
Users that are interested in awesome-rocm are comparing it to the libraries listed below
Sorting:
- ๐ A curated list of awesome matrix-matrix multiplication (A * B = C) frameworks, libraries and softwareโ35Updated 3 months ago
- Advanced Matrix Extensions (AMX) Guideโ90Updated 3 years ago
- ๐๐๐ This repository lists some awesome public CUDA, cuda-python, cuBLAS, cuDNN, CUTLASS, TensorRT, TensorRT-LLM, Triton, TVM, MLIR, PTโฆโ275Updated this week
- DeepSeek-V3/R1 inference performance simulatorโ134Updated 2 months ago
- Intelยฎ Tensor Processing Primitives extension for Pytorch*โ17Updated 2 weeks ago
- CUTLASS and CuTe Examplesโ52Updated 5 months ago
- oneAPI Collective Communications Library (oneCCL)โ234Updated 2 weeks ago
- Microsoft Collective Communication Libraryโ65Updated 6 months ago
- rocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.โ86Updated last week
- โ146Updated this week
- Intelยฎ Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Noteโฆโ61Updated 3 months ago
- Magicube is a high-performance library for quantized sparse matrix operations (SpMM and SDDMM) of deep learning on Tensor Cores.โ88Updated 2 years ago
- hipBLASLt is a library that provides general matrix-matrix operations with a flexible API and extends functionalities beyond a traditionaโฆโ97Updated this week
- Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorialโ268Updated this week
- A lightweight design for computation-communication overlap.โ132Updated last month
- โ65Updated 2 months ago
- โ99Updated this week
- ROCm Communication Collectives Library (RCCL)โ338Updated this week
- CSV spreadsheets and other material for AI accelerator survey papersโ169Updated last year
- Dissecting NVIDIA GPU Architectureโ95Updated 2 years ago
- Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operatorsโ401Updated this week
- Hands-On Practical MLIR Tutorialโ25Updated 10 months ago
- Automated Parallelization System and Infrastructure for Multiple Ecosystemsโ79Updated 6 months ago
- An experimental CPU backend for Tritonโ119Updated this week
- Assembler and Decompiler for NVIDIA (Maxwell Pascal Volta Turing Ampere) GPUs.โ79Updated 2 years ago
- CUDA PTX-ISA Document ไธญๆ็ฟป่ฏ็โ42Updated last week
- โ25Updated 3 months ago
- โ27Updated 3 weeks ago
- โ96Updated last year
- LLM Inference analyzer for different hardware platformsโ69Updated last week