lcpu-club / awesome-rocm
Collections and tutorials for ROCm
☆25Updated last year
Alternatives and similar repositories for awesome-rocm:
Users that are interested in awesome-rocm are comparing it to the libraries listed below
- 🚀🚀🚀 This repository lists some awesome public CUDA, cuda-python, cuBLAS, cuDNN, CUTLASS, TensorRT, TensorRT-LLM, Triton, TVM, MLIR, PT…☆250Updated last week
- Intel® Tensor Processing Primitives extension for Pytorch*☆14Updated this week
- DeepSeek-V3/R1 inference performance simulator☆113Updated last month
- Advanced Matrix Extensions (AMX) Guide☆88Updated 3 years ago
- This is the top-level repository for the Accel-Sim framework.☆395Updated last week
- rocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.☆77Updated this week
- oneAPI Collective Communications Library (oneCCL)☆232Updated this week
- Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators☆389Updated this week
- Microsoft Collective Communication Library☆65Updated 5 months ago
- ☆93Updated this week
- Magicube is a high-performance library for quantized sparse matrix operations (SpMM and SDDMM) of deep learning on Tensor Cores.☆87Updated 2 years ago
- Automated Parallelization System and Infrastructure for Multiple Ecosystems☆78Updated 5 months ago
- Assembler and Decompiler for NVIDIA (Maxwell Pascal Volta Turing Ampere) GPUs.☆78Updated 2 years ago
- Intel® Extension for MLIR. A staging ground for MLIR dialects and tools for Intel devices using the MLIR toolchain.☆134Updated last week
- ☆95Updated last year
- OpenAI Triton backend for Intel® GPUs☆182Updated this week
- MSCCL++: A GPU-driven communication stack for scalable AI applications☆342Updated this week
- Unified Collective Communication Library☆248Updated this week
- LLM Inference analyzer for different hardware platforms☆62Updated 3 weeks ago
- A repository where GPU applications are aggregated using a common build flow that supports multiple CUDA versions.☆60Updated this week
- Experiments and prototypes associated with IREE or MLIR☆50Updated 8 months ago
- ☆141Updated this week
- ☆138Updated 9 months ago
- ☆60Updated last year
- CUDA Templates for Linear Algebra Subroutines☆20Updated this week
- AI Accelerator Benchmark focuses on evaluating AI Accelerators from a practical production perspective, including the ease of use and ver…☆236Updated last week
- ☆39Updated 10 months ago
- Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial☆255Updated last month
- Dissecting NVIDIA GPU Architecture☆92Updated 2 years ago
- We invite you to visit and follow our new repository at https://github.com/microsoft/TileFusion. TiledCUDA is a highly efficient kernel …☆181Updated 2 months ago