amd / fuzzyHSA
☆54Updated 10 months ago
Alternatives and similar repositories for fuzzyHSA:
Users that are interested in fuzzyHSA are comparing it to the libraries listed below
- Super fast FP32 matrix multiplication on RDNA3☆46Updated 2 weeks ago
- ☆56Updated 9 months ago
- High-Performance SGEMM on CUDA devices☆90Updated 2 months ago
- ☆19Updated 2 weeks ago
- AI Tensor Engine for ROCm☆168Updated this week
- Schola is a plugin for enabling Reinforcement Learning (RL) in Unreal Engine. It provides tools to help developers create environments, d…☆34Updated 2 weeks ago
- The HIP Environment and ROCm Kit - A lightweight open source build system for HIP and ROCm☆48Updated this week
- Onboarding documentation source for the AMD Ryzen™ AI Software Platform. The AMD Ryzen™ AI Software Platform enables developers to take…☆55Updated last week
- RDNA3 emulator☆54Updated this week
- Custom PTX Instruction Benchmark☆123Updated last month
- ROCm BLAS marshalling library☆136Updated this week
- Tensor Tiling Library☆36Updated this week
- hipBLASLt is a library that provides general matrix-matrix operations with a flexible API and extends functionalities beyond a traditiona…☆90Updated this week
- Tenstorrent's MLIR Based Compiler. We aim to enable developers to run AI on all configurations of Tenstorrent hardware, through an open-s…☆39Updated this week
- ctypes wrappers for HIP, CUDA, and OpenCL☆129Updated 9 months ago
- Bandwidth test for ROCm☆54Updated last week
- rocWMMA☆108Updated this week
- OpenCL/SPIR-V implementation of HIP☆104Updated 2 years ago
- NVIDIA Linux open GPU with P2P support☆16Updated 3 weeks ago
- AMD’s C++ library for accelerating tensor primitives☆39Updated this week
- Online compiler for HIP and NVIDIA® CUDA® code to WebGPU☆143Updated 3 months ago
- GPUOcelot: A dynamic compilation framework for PTX☆185Updated 2 months ago
- ☆105Updated last week
- LLM training in simple, raw C/CUDA☆92Updated 11 months ago
- CMake modules used within the ROCm libraries☆65Updated this week
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆50Updated last month
- Development repository for the Triton language and compiler☆118Updated this week
- ☆441Updated 2 weeks ago
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆86Updated this week
- The AMD rocAL is designed to efficiently decode and process images and videos from a variety of storage formats and modify them through a…☆17Updated last week