amd / fuzzyHSA
☆54Updated 8 months ago
Alternatives and similar repositories for fuzzyHSA:
Users that are interested in fuzzyHSA are comparing it to the libraries listed below
- ☆53Updated 7 months ago
- RDNA3 emulator☆50Updated 2 weeks ago
- High-Performance SGEMM on CUDA devices☆73Updated 3 weeks ago
- LLM training in simple, raw C/HIP for AMD GPUs☆40Updated 4 months ago
- ☆426Updated 2 months ago
- Gpu benchmark☆52Updated 2 weeks ago
- Tensor Tiling Library☆34Updated 5 months ago
- Repository of model demos using TT-Buda☆61Updated 2 months ago
- chipStar is a tool for compiling and running HIP/CUDA on SPIR-V via OpenCL or Level Zero APIs.☆249Updated this week
- ☆18Updated 4 months ago
- Onboarding documentation source for the AMD Ryzen™ AI Software Platform. The AMD Ryzen™ AI Software Platform enables developers to take…☆51Updated this week
- User-Mode Driver for Tenstorrent hardware☆14Updated this week
- Nvidia Instruction Set Specification Generator☆241Updated 7 months ago
- ROCm Application for Reporting System Info☆37Updated this week
- tenstorrent kernel from twitch☆27Updated 10 months ago
- Fork of LLVM to support AMD AIEngine processors☆123Updated this week
- Tiny ASIC implementation for "The Era of 1-bit LLMs All Large Language Models are in 1.58 Bits" matrix multiplication unit☆118Updated 9 months ago
- rocWMMA☆100Updated this week
- General purpose GPU compute framework built on Vulkan to support 1000s of cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). …☆45Updated 4 months ago
- llama.cpp fork used by GPT4All☆51Updated this week
- Bandwidth test for ROCm☆54Updated this week
- GNA - Gaussian & Neural Accelerator Library repository☆92Updated 9 months ago
- The Riallto Open Source Project from AMD☆71Updated 3 months ago
- ☆217Updated last week
- asynchronous/distributed speculative evaluation for llama3☆37Updated 6 months ago
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆88Updated this week
- AMD’s C++ library for accelerating tensor primitives☆37Updated this week
- Emulating double-precision arithmetic on Apple GPUs☆48Updated last year