ROCm / amd_matrix_instruction_calculatorView external linksLinks
A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators
☆126Nov 14, 2025Updated 3 months ago
Alternatives and similar repositories for amd_matrix_instruction_calculator
Users that are interested in amd_matrix_instruction_calculator are comparing it to the libraries listed below
Sorting:
- amdgpu example code in hip/asm☆55Feb 1, 2026Updated last week
- [DEPRECATED] Moved to ROCm/rocm-systems repo☆165Feb 3, 2026Updated last week
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆138Jan 26, 2026Updated 2 weeks ago
- ☆112Apr 19, 2024Updated last year
- [DEPRECATED] Moved to ROCm/rocm-libraries repo. NOTE: develop branch is maintained as a read-only mirror☆520Updated this week
- ☆18Jan 17, 2024Updated 2 years ago
- AMD HPC Research Fund Cloud☆17Jan 19, 2026Updated 3 weeks ago
- Utilities for accessing AMD's Machine-Readable GPU ISA Specifications.☆45Sep 24, 2025Updated 4 months ago
- ☆167Feb 7, 2026Updated last week
- ☆29Feb 5, 2026Updated last week
- ☆24May 9, 2025Updated 9 months ago
- A GPU FP32 computation method with Tensor Cores.☆26Dec 8, 2025Updated 2 months ago
- AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming☆168Updated this week
- AI Tensor Engine for ROCm☆351Updated this week
- chipStar is a tool for compiling and running HIP/CUDA on SPIR-V via OpenCL or Level Zero APIs.☆314Updated this week
- Unit Scaling demo and experimentation code☆16Mar 12, 2024Updated last year
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆178Updated this week
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆148Jan 27, 2026Updated 2 weeks ago
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆255Updated this week
- super repo for rocm libraries☆243Updated this week
- [DEPRECATED] Moved to ROCm/rocm-systems repo☆154Jan 21, 2026Updated 3 weeks ago
- Using C++ magic to capture CUDA kernels and tune them with Kernel Tuner☆21Sep 12, 2025Updated 5 months ago
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆134Jan 21, 2026Updated 3 weeks ago
- ☆18Jun 6, 2025Updated 8 months ago
- Express DLA implementation for FPGA, revised based on NVDLA.☆11Oct 17, 2019Updated 6 years ago
- ☆11Nov 14, 2023Updated 2 years ago
- ☆11Aug 21, 2023Updated 2 years ago
- AMD lab notes with code examples to demonstrate use of AMD GPUs☆110Jun 28, 2024Updated last year
- SYCL Reference Manual☆30Jan 29, 2026Updated 2 weeks ago
- Efficient implementation of DeepSeek Ops (Blockwise FP8 GEMM, MoE, and MLA) for AMD Instinct MI300X☆75Feb 1, 2026Updated last week
- ☆60Updated this week
- SYCL implementation of Fused MLPs for Intel GPUs☆51Nov 24, 2025Updated 2 months ago
- Compute applications.☆25Dec 12, 2019Updated 6 years ago
- collection of benchmarks to measure basic GPU capabilities☆492Oct 24, 2025Updated 3 months ago
- AMD’s C++ library for accelerating tensor primitives☆49Feb 3, 2026Updated last week
- [DEPRECATED] Moved to ROCm/rocm-systems repo☆144Updated this week
- ☆19Feb 5, 2026Updated last week
- TransferBench is a utility capable of benchmarking simultaneous copies between user-specified devices (CPUs/GPUs)☆57Updated this week
- A collection of examples for the ROCm software stack☆274Feb 6, 2026Updated last week