lcpu-club / awesome-rocm
Collections and tutorials for ROCm
โ25Updated last year
Alternatives and similar repositories for awesome-rocm:
Users that are interested in awesome-rocm are comparing it to the libraries listed below
- ๐๐๐ This repository lists some awesome public CUDA, cuda-python, cuBLAS, cuDNN, CUTLASS, TensorRT, TensorRT-LLM, Triton, TVM, MLIR, PTโฆโ229Updated last week
- An experimental CPU backend for Tritonโ101Updated this week
- hipBLASLt is a library that provides general matrix-matrix operations with a flexible API and extends functionalities beyond a traditionaโฆโ83Updated this week
- โ138Updated this week
- Python SYCL bindings and SYCL-based Python Array API libraryโ110Updated this week
- rocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.โ64Updated this week
- Intelยฎ Tensor Processing Primitives extension for Pytorch*โ12Updated last week
- โ25Updated this week
- Hands-On Practical MLIR Tutorialโ20Updated 8 months ago
- oneAPI Collective Communications Library (oneCCL)โ227Updated last week
- Microsoft Collective Communication Libraryโ60Updated 4 months ago
- โ37Updated this week
- Intelยฎ Extension for MLIR. A staging ground for MLIR dialects and tools for Intel devices using the MLIR toolchain.โ130Updated last week
- Advanced Matrix Extensions (AMX) Guideโ83Updated 3 years ago
- performance engineeringโ30Updated 8 months ago
- Benchmark Framework for Buddy Projectsโ53Updated last month
- An experimental CPU backend for Triton (https//github.com/openai/triton)โ40Updated last week
- A repository where GPU applications are aggregated using a common build flow that supports multiple CUDA versions.โ59Updated 2 weeks ago
- ROCm Tracer Callback/Activity Library for Performance tracing AMD GPUsโ79Updated this week
- A collection of examples for the ROCm software stackโ194Updated this week
- Experiments and prototypes associated with IREE or MLIRโ50Updated 7 months ago
- Shared Middle-Layer for Triton Compilationโ233Updated 2 weeks ago
- Unified Collective Communication Libraryโ239Updated this week
- โ91Updated last week
- โ30Updated 2 years ago
- LLM Inference analyzer for different hardware platformsโ54Updated last week
- IREE's PyTorch Frontend, based on Torch Dynamo.โ74Updated this week
- Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorialโ251Updated last week
- AI Tensor Engine for ROCmโ119Updated this week
- Bandwidth test for ROCmโ54Updated 2 weeks ago