A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators
☆127Nov 14, 2025Updated 3 months ago
Alternatives and similar repositories for amd_matrix_instruction_calculator
Users that are interested in amd_matrix_instruction_calculator are comparing it to the libraries listed below
Sorting:
- amdgpu example code in hip/asm☆56Updated this week
- [DEPRECATED] Moved to ROCm/rocm-systems repo☆165Feb 16, 2026Updated 2 weeks ago
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆139Feb 27, 2026Updated last week
- ☆112Apr 19, 2024Updated last year
- [DEPRECATED] Moved to ROCm/rocm-libraries repo. NOTE: develop branch is maintained as a read-only mirror☆523Updated this week
- ☆18Jan 17, 2024Updated 2 years ago
- AMD HPC Research Fund Cloud☆17Feb 16, 2026Updated 2 weeks ago
- Utilities for accessing AMD's Machine-Readable GPU ISA Specifications.☆46Sep 24, 2025Updated 5 months ago
- ☆169Updated this week
- ☆30Updated this week
- ☆24May 9, 2025Updated 9 months ago
- A GPU FP32 computation method with Tensor Cores.☆26Dec 8, 2025Updated 3 months ago
- AI Tensor Engine for ROCm☆367Updated this week
- AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming☆179Updated this week
- chipStar is a tool for compiling and running HIP/CUDA on SPIR-V via OpenCL or Level Zero APIs.☆320Updated this week
- Unit Scaling demo and experimentation code☆16Mar 12, 2024Updated last year
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆177Feb 26, 2026Updated last week
- Repository with examples and exercises for OLCF and AMD's HIP training series☆17Oct 16, 2023Updated 2 years ago
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆148Jan 27, 2026Updated last month
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆256Updated this week
- study of cutlass☆22Nov 10, 2024Updated last year
- [DEPRECATED] Moved to ROCm/rocm-systems repo☆153Jan 21, 2026Updated last month
- super repo for rocm libraries☆268Updated this week
- Using C++ magic to capture CUDA kernels and tune them with Kernel Tuner☆21Sep 12, 2025Updated 5 months ago
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆134Feb 26, 2026Updated last week
- ☆18Jun 6, 2025Updated 9 months ago
- ☆11Nov 14, 2023Updated 2 years ago
- Express DLA implementation for FPGA, revised based on NVDLA.☆11Oct 17, 2019Updated 6 years ago
- ☆11Aug 21, 2023Updated 2 years ago
- CPU and GPU tutorial examples☆13Apr 4, 2025Updated 11 months ago
- AMD lab notes with code examples to demonstrate use of AMD GPUs☆108Jun 28, 2024Updated last year
- SYCL Reference Manual☆30Feb 11, 2026Updated 3 weeks ago
- Efficient implementation of DeepSeek Ops (Blockwise FP8 GEMM, MoE, and MLA) for AMD Instinct MI300X☆75Feb 11, 2026Updated 3 weeks ago
- SYCL implementation of Fused MLPs for Intel GPUs☆50Nov 24, 2025Updated 3 months ago
- ☆116Updated this week
- Compute applications.☆25Dec 12, 2019Updated 6 years ago
- collection of benchmarks to measure basic GPU capabilities☆498Oct 24, 2025Updated 4 months ago
- AMD’s C++ library for accelerating tensor primitives☆49Feb 18, 2026Updated 2 weeks ago
- ☆63Updated this week