seb-v / fp32_sgemm_amdLinks
Super fast FP32 matrix multiplication on RDNA3
☆79Updated 7 months ago
Alternatives and similar repositories for fp32_sgemm_amd
Users that are interested in fp32_sgemm_amd are comparing it to the libraries listed below
Sorting:
- High-Performance SGEMM on CUDA devices☆110Updated 9 months ago
- A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators☆120Updated this week
- Unofficial description of the CUDA assembly (SASS) instruction sets.☆160Updated 4 months ago
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆137Updated this week
- amdgpu example code in hip/asm☆46Updated last week
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆47Updated 3 months ago
- GPUOcelot: A dynamic compilation framework for PTX☆212Updated 9 months ago
- Development repository for the Triton language and compiler☆137Updated last week
- ☆157Updated this week
- Tenstorrent MLIR compiler