amd / fuzzyHSALinks
☆53Updated last year
Alternatives and similar repositories for fuzzyHSA
Users that are interested in fuzzyHSA are comparing it to the libraries listed below
Sorting:
- ☆66Updated last year
- Super fast FP32 matrix multiplication on RDNA3☆82Updated 9 months ago
- ☆451Updated 9 months ago
- Onboarding documentation source for the AMD Ryzen™ AI Software Platform. The AMD Ryzen™ AI Software Platform enables developers to take…☆91Updated last week
- RDNA3 emulator☆55Updated 9 months ago
- LLM training in simple, raw C/HIP for AMD GPUs☆57Updated last year
- chipStar is a tool for compiling and running HIP/CUDA on SPIR-V via OpenCL or Level Zero APIs.☆311Updated last week
- Repository of model demos using TT-Buda☆63Updated 9 months ago
- ☆23Updated 2 months ago
- ☆180Updated last month
- Fast and Furious AMD Kernels☆336Updated this week
- [DEPRECATED] Moved to ROCm/rocm-systems repo☆42Updated last month
- ☆153Updated 2 weeks ago
- CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning☆318Updated last week
- General purpose GPU compute framework built on Vulkan to support 1000s of cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). …☆52Updated 10 months ago
- Gpu benchmark☆73Updated 11 months ago
- A system validation and diagnostics tool for monitoring, stress testing, detecting, and troubleshooting issues impacting AMD GPUs in high…☆93Updated this week
- Make PyTorch models at least run on APUs.☆56Updated 2 years ago
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆113Updated this week
- Custom PTX Instruction Benchmark☆137Updated 10 months ago
- High-Performance SGEMM on CUDA devices☆115Updated 11 months ago
- User-Mode Driver for Tenstorrent hardware☆36Updated this week
- ☆119Updated this week
- ctypes wrappers for HIP, CUDA, and OpenCL☆130Updated last year
- Tensor Tiling Library☆38Updated 3 months ago
- Tenstorrent console based hardware information program☆58Updated this week
- [DEPRECATED] Moved to ROCm/rocm-libraries repo☆148Updated this week
- ☆514Updated this week
- asynchronous/distributed speculative evaluation for llama3☆39Updated last year
- AMD related optimizations for transformer models☆97Updated 3 months ago