ROCm / Tensile
Stretching GPU performance for GEMMs and tensor contractions.
☆233Updated this week
Alternatives and similar repositories for Tensile:
Users that are interested in Tensile are comparing it to the libraries listed below
- rocWMMA☆101Updated this week
- ROCm Device Libraries☆97Updated 9 months ago
- AMD's graph optimization engine.☆209Updated this week
- ROCm BLAS marshalling library☆132Updated this week
- Next generation BLAS implementation for ROCm platform☆362Updated this week
- ROCm Parallel Primitives☆170Updated this week
- RAND library for HIP programming language☆115Updated this week
- ☆137Updated this week
- ROCm Platform Runtime: ROCr a HPC market enhanced HSA based runtime☆236Updated this week
- This is the AMD-maintained fork of the LLVM git repository. This repository accepts pull requests and issues related to AMD fork-specific…☆136Updated this week
- ROC profiler library. Profiling with perf-counters and derived metrics.☆135Updated last week
- A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators☆76Updated last year
- Next generation FFT implementation for ROCm☆189Updated this week
- ROCm Tracer Callback/Activity Library for Performance tracing AMD GPUs☆79Updated last week
- Assembler for NVIDIA Volta and Turing GPUs☆214Updated 3 years ago
- MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.☆128Updated last year
- Next generation SPARSE implementation for ROCm platform☆118Updated this week
- ROCm Communication Collectives Library (RCCL)☆303Updated this week
- ☆60Updated 2 months ago
- Machine Intelligence Shader Autogen. AMDGPU ML shader code generator. (previously iGEMMgen)☆34Updated 5 months ago
- MIOpenGEMM is now deprecated☆62Updated last year
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆104Updated 7 years ago
- ROCm's Thunk Interface☆85Updated 2 months ago
- collection of benchmarks to measure basic GPU capabilities☆302Updated 2 weeks ago
- Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators☆355Updated this week
- An extension library of WMMA API (Tensor Core API)☆90Updated 7 months ago
- Reusable software components for ROCm developers☆81Updated this week
- Examples for HIP☆203Updated 2 months ago
- ROCm Thrust - run Thrust dependent software on AMD GPUs☆106Updated this week
- A system validation and diagnostics tool for monitoring, stress testing, detecting, and troubleshooting issues impacting AMD GPUs in high…☆67Updated this week