lcpu-club / awesome-rocmLinks
Collections and tutorials for ROCm
โ27Updated last month
Alternatives and similar repositories for awesome-rocm
Users that are interested in awesome-rocm are comparing it to the libraries listed below
Sorting:
- ๐๐๐ This repository lists some awesome public CUDA, cuda-python, cuBLAS, cuDNN, CUTLASS, TensorRT, TensorRT-LLM, Triton, TVM, MLIR, PTโฆโ288Updated 3 weeks ago
- FlagTree is a unified compiler for multiple AI chips, which is forked from triton-lang/triton.โ53Updated this week
- DeepSeek-V3/R1 inference performance simulatorโ149Updated 3 months ago
- Solution of Programming Massively Parallel Processorsโ48Updated last year
- Magicube is a high-performance library for quantized sparse matrix operations (SpMM and SDDMM) of deep learning on Tensor Cores.โ89Updated 2 years ago
- rocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.โ90Updated this week
- Advanced Matrix Extensions (AMX) Guideโ92Updated 3 years ago
- ๐ A curated list of awesome matrix-matrix multiplication (A * B = C) frameworks, libraries and softwareโ41Updated 4 months ago
- Summary of the Specs of Commonly Used GPUs for Training and Inference of LLMโ46Updated 3 months ago
- Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorialโ274Updated 2 weeks ago
- โ73Updated 2 months ago
- Intelยฎ Tensor Processing Primitives extension for Pytorch*โ17Updated last week
- โ148Updated this week
- We invite you to visit and follow our new repository at https://github.com/microsoft/TileFusion. TiledCUDA is a highly efficient kernel โฆโ183Updated 5 months ago
- Microsoft Collective Communication Libraryโ64Updated 7 months ago
- A lightweight design for computation-communication overlap.โ143Updated last week
- โ98Updated last year
- Dissecting NVIDIA GPU Architectureโ97Updated 2 years ago
- โ110Updated 3 weeks ago
- CUTLASS and CuTe Examplesโ57Updated 5 months ago
- Automatic Mapping Generation, Verification, and Exploration for ISA-based Spatial Acceleratorsโ112Updated 2 years ago
- โ39Updated last year
- โ12Updated 3 years ago
- LLM Inference analyzer for different hardware platformsโ74Updated last month
- A home for the final text of all TVM RFCs.โ105Updated 9 months ago
- โก๏ธFFPA: Extend FlashAttention-2 with Split-D, achieve ~O(1) SRAM complexity for large headdim, 1.8x~3xโ vs SDPA.โ186Updated last month
- [DEPRECATED] Moved to ROCm/rocm-libraries repoโ106Updated this week
- Assembler and Decompiler for NVIDIA (Maxwell Pascal Volta Turing Ampere) GPUs.โ80Updated 2 years ago
- performance engineeringโ30Updated 11 months ago
- โ35Updated last month