ROCm / gpuaidevLinks
Repository to host ROCm Developer Hub Notebook Tutorials
☆11Updated 2 weeks ago
Alternatives and similar repositories for gpuaidev
Users that are interested in gpuaidev are comparing it to the libraries listed below
Sorting:
- ☆25Updated this week
- ☆38Updated this week
- ☆62Updated 6 months ago
- A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators☆98Updated last month
- AI Tensor Engine for ROCm☆208Updated this week
- rocWMMA☆115Updated last week
- Experimental projects related to TensorRT☆105Updated last week
- Development repository for the Triton language and compiler☆125Updated this week
- ☆117Updated last month
- CUDA Matrix Multiplication Optimization☆196Updated 11 months ago
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆106Updated this week
- A CUTLASS implementation using SYCL☆27Updated this week
- CUTLASS and CuTe Examples☆57Updated 5 months ago
- Ongoing research training transformer models at scale☆23Updated 2 weeks ago
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆337Updated this week
- The goal of the OSSCI Fleet is to provide a central mechanism to enable test automation, batch job scheduling, and developer access to a …☆12Updated last week
- ☆90Updated 5 months ago
- amdgpu example code in hip/asm☆32Updated last week
- ☆27Updated last week
- Efficient implementation of DeepSeek Ops (Blockwise FP8 GEMM, MoE, and MLA) for AMD Instinct MI300X☆50Updated last week
- An experimental CPU backend for Triton☆126Updated 3 weeks ago
- RCCL Performance Benchmark Tests☆68Updated last month
- An extension library of WMMA API (Tensor Core API)☆99Updated 11 months ago
- ☆98Updated last year
- Matrix Multiply-Accumulate with CUDA and WMMA( Tensor Core)☆136Updated 4 years ago
- ☆148Updated this week
- Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators☆427Updated this week
- rocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.☆90Updated this week
- QuickReduce is a performant all-reduce library designed for AMD ROCm that supports inline compression.☆28Updated 3 months ago
- OpenAI Triton backend for Intel® GPUs☆191Updated this week