pkucnc / awesome-rocmLinks
Collections and tutorials for ROCm
โ27Updated 3 months ago
Alternatives and similar repositories for awesome-rocm
Users that are interested in awesome-rocm are comparing it to the libraries listed below
Sorting:
- ๐๐๐ This repository lists some awesome public CUDA, cuda-python, cuBLAS, cuDNN, CUTLASS, TensorRT, TensorRT-LLM, Triton, TVM, MLIR, PTโฆโ313Updated last month
- Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operatorsโ452Updated this week
- collection of benchmarks to measure basic GPU capabilitiesโ411Updated 6 months ago
- Advanced Matrix Extensions (AMX) Guideโ95Updated 3 years ago
- ๐ A curated list of awesome matrix-matrix multiplication (A * B = C) frameworks, libraries and softwareโ50Updated 6 months ago
- rocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.โ100Updated this week
- DeepSeek-V3/R1 inference performance simulatorโ165Updated 5 months ago
- BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.โ663Updated 3 weeks ago
- This is the top-level repository for the Accel-Sim framework.โ464Updated 3 weeks ago
- LLM Inference analyzer for different hardware platformsโ87Updated last month
- Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorialโ293Updated 2 months ago
- CUTLASS and CuTe Examplesโ72Updated last month
- An MLIR-based toolchain for AMD AI Engine-enabled devices.โ467Updated this week
- Awesome resources for GPUsโ585Updated 2 years ago
- AI Tensor Engine for ROCmโ260Updated this week
- Stepwise optimizations of DGEMM on CPU, reaching performance faster than Intel MKL eventually, even under multithreading.โ152Updated 3 years ago
- FlagTree is a unified compiler for multiple AI chips, which is forked from triton-lang/triton.โ76Updated this week
- A collection of examples for the ROCm software stackโ236Updated this week
- Assembler and Decompiler for NVIDIA (Maxwell Pascal Volta Turing Ampere) GPUs.โ84Updated 2 years ago
- CUDA Matrix Multiplication Optimizationโ218Updated last year
- [DEPRECATED] Moved to ROCm/rocm-libraries repoโ111Updated this week
- Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.โ373Updated 7 months ago
- Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of papโฆโ266Updated 5 months ago
- CUDA PTX-ISA Document ไธญๆ็ฟป่ฏ็โ44Updated 3 months ago
- Dissecting NVIDIA GPU Architectureโ104Updated 3 years ago
- Intelยฎ Tensor Processing Primitives extension for Pytorch*โ17Updated 3 weeks ago
- PyTorch emulation library for Microscaling (MX)-compatible data formatsโ288Updated 2 months ago
- Magicube is a high-performance library for quantized sparse matrix operations (SpMM and SDDMM) of deep learning on Tensor Cores.โ89Updated 2 years ago
- AI Accelerator Benchmark focuses on evaluating AI Accelerators from a practical production perspective, including the ease of use and verโฆโ261Updated 2 weeks ago
- RCCL Performance Benchmark Testsโ73Updated last week