amd / fuzzyHSALinks
☆54Updated 11 months ago
Alternatives and similar repositories for fuzzyHSA
Users that are interested in fuzzyHSA are comparing it to the libraries listed below
Sorting:
- Schola is a plugin for enabling Reinforcement Learning (RL) in Unreal Engine. It provides tools to help developers create environments, d…☆42Updated 3 weeks ago
- Super fast FP32 matrix multiplication on RDNA3☆61Updated 2 months ago
- ☆58Updated 10 months ago
- The HIP Environment and ROCm Kit - A lightweight open source build system for HIP and ROCm☆126Updated this week
- LLM training in simple, raw C/HIP for AMD GPUs☆50Updated 8 months ago
- Onboarding documentation source for the AMD Ryzen™ AI Software Platform. The AMD Ryzen™ AI Software Platform enables developers to take…☆64Updated 2 weeks ago
- ☆102Updated this week
- ☆445Updated last month
- User-Mode Driver for Tenstorrent hardware☆22Updated this week
- ☆20Updated 3 weeks ago
- Derived from Nemes' gpuperftests☆30Updated 10 months ago
- AI Tensor Engine for ROCm☆201Updated this week
- High-Performance SGEMM on CUDA devices☆94Updated 4 months ago
- Tensor Tiling Library☆36Updated last month
- Bandwidth test for ROCm☆56Updated 2 weeks ago
- tenstorrent kernel from twitch☆27Updated last year
- Repository of model demos using TT-Buda☆62Updated 2 months ago
- Tenstorrent's MLIR Based Compiler. We aim to enable developers to run AI on all configurations of Tenstorrent hardware, through an open-s…☆59Updated this week
- Attention in SRAM on Tenstorrent Grayskull☆35Updated 10 months ago
- ☆30Updated this week
- RDNA3 emulator☆54Updated last month
- Custom PTX Instruction Benchmark☆126Updated 3 months ago
- The Riallto Open Source Project from AMD☆79Updated last month
- Make PyTorch models at least run on APUs.☆55Updated last year
- rocWMMA☆114Updated this week
- Development repository for the Triton language and compiler☆122Updated this week
- chipStar is a tool for compiling and running HIP/CUDA on SPIR-V via OpenCL or Level Zero APIs.☆277Updated this week
- Multi-heap-sort for many small arrays, quicksort with 3 pivots for one big array, CUDA acceleration, CUDA memory compression.☆11Updated 8 months ago
- GPUOcelot: A dynamic compilation framework for PTX☆192Updated 3 months ago
- 8-bit CUDA functions for PyTorch, ported to HIP for use in AMD GPUs☆49Updated 2 years ago