kennethdsheridan / rocm_gpu_tradecraftLinks
Commands that will make you more comfortable with the ROCm toolkit.
☆17Updated last year
Alternatives and similar repositories for rocm_gpu_tradecraft
Users that are interested in rocm_gpu_tradecraft are comparing it to the libraries listed below
Sorting:
- ☆27Updated last week
- RCCL Performance Benchmark Tests☆76Updated last week
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆356Updated this week
- Evaluating Large Language Models for CUDA Code Generation ComputeEval is a framework designed to generate and evaluate CUDA code from Lar…☆66Updated 3 months ago
- Development repository for the Triton language and compiler☆131Updated this week
- QuickReduce is a performant all-reduce library designed for AMD ROCm that supports inline compression.☆33Updated 3 weeks ago
- AI Tensor Engine for ROCm☆279Updated this week
- OpenAI Triton backend for Intel® GPUs☆208Updated this week
- ROCm Communication Collectives Library (RCCL)☆381Updated this week
- AMD RAD's experimental RMA library for Triton.☆74Updated last week
- Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators☆465Updated this week
- oneCCL Bindings for Pytorch*☆102Updated last month
- Ahead of Time (AOT) Triton Math Library☆76Updated last week
- Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…☆62Updated 2 months ago
- rocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.☆117Updated this week
- Test suite for probing the numerical behavior of NVIDIA tensor cores☆41Updated last year
- collection of benchmarks to measure basic GPU capabilities☆419Updated 7 months ago
- CUDA Kernel Benchmarking Library☆724Updated last week
- A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators☆116Updated 4 months ago
- Kernel Tuner☆361Updated last week
- ☆57Updated this week
- SYCL based CUTLASS implementation for Intel GPUs☆39Updated this week
- Training material for Nsight developer tools☆167Updated last year
- ☆45Updated this week
- An experimental CPU backend for Triton☆153Updated 3 months ago
- Efficient implementation of DeepSeek Ops (Blockwise FP8 GEMM, MoE, and MLA) for AMD Instinct MI300X☆69Updated last month
- ☆240Updated this week
- ☆42Updated last week
- ☆62Updated 9 months ago
- PArametrized Recommendation and Ai Model benchmark is a repository for development of numerous uBenchmarks as well as end to end nets for…☆151Updated 3 weeks ago