pkucnc / awesome-rocmLinks
Collections and tutorials for ROCm
โ30Updated 8 months ago
Alternatives and similar repositories for awesome-rocm
Users that are interested in awesome-rocm are comparing it to the libraries listed below
Sorting:
- ๐๐๐ This repository lists some awesome public CUDA, cuda-python, cuBLAS, cuDNN, CUTLASS, TensorRT, TensorRT-LLM, Triton, TVM, MLIR, PTโฆโ439Updated 6 months ago
- ๐ A curated list of awesome matrix-matrix multiplication (A * B = C) frameworks, libraries and softwareโ60Updated 11 months ago
- An experimental CPU backend for Tritonโ173Updated 2 months ago
- DeepSeek-V3/R1 inference performance simulatorโ176Updated 10 months ago
- [DEPRECATED] Moved to ROCm/rocm-systems repoโ144Updated last week
- Advanced Matrix Extensions (AMX) Guideโ108Updated 4 years ago
- CUTLASS and CuTe Examplesโ117Updated 2 months ago
- Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operatorsโ515Updated this week
- CUDA Matrix Multiplication Optimizationโ256Updated last year
- Awesome resources for GPUsโ608Updated 2 years ago
- a static analytical model for LLM distributed trainingโ114Updated 3 weeks ago
- LLM Inference analyzer for different hardware platformsโ99Updated last month
- BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.โ749Updated 5 months ago
- Shared Middle-Layer for Triton Compilationโ324Updated last month
- Intelยฎ Tensor Processing Primitives extension for Pytorch*โ18Updated 2 weeks ago
- โก๏ธWrite HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peakโก๏ธ Performance.โ148Updated 8 months ago
- collection of benchmarks to measure basic GPU capabilitiesโ489Updated 3 months ago
- FlagTree is a unified compiler supporting multiple AI chip backends for custom Deep Learning operations, which is forked from triton-langโฆโ197Updated this week
- Assembler and Decompiler for NVIDIA (Maxwell Pascal Volta Turing Ampere) GPUs.โ96Updated 2 years ago
- A lightweight design for computation-communication overlap.โ213Updated last week
- โ111Updated last year
- Solution of Programming Massively Parallel Processorsโ49Updated 2 years ago
- Triton adapter for Ascend. Mirror of https://gitee.com/ascend/triton-ascendโ105Updated this week
- โ53Updated 10 months ago
- Tile-based language built for AI computation across all scalesโ119Updated this week
- โ93Updated 10 months ago
- AI Tensor Engine for ROCmโ348Updated this week
- AI Accelerator Benchmark focuses on evaluating AI Accelerators from a practical production perspective, including the ease of use and verโฆโ298Updated 2 weeks ago
- oneAPI Collective Communications Library (oneCCL)โ253Updated last month
- Development repository for the Triton-Linalg conversionโ214Updated 11 months ago