pomoke / torch-apu-helperLinks
Make PyTorch models at least run on APUs.
☆57Updated last year
Alternatives and similar repositories for torch-apu-helper
Users that are interested in torch-apu-helper are comparing it to the libraries listed below
Sorting:
- ☆63Updated 6 months ago
- ☆493Updated this week
- Fork of ollama for vulkan support☆107Updated 9 months ago
- build scripts for ROCm☆188Updated last year
- ☆414Updated 7 months ago
- ☆234Updated 2 years ago
- Fresh builds of llama.cpp with AMD ROCm™ 7 acceleration☆103Updated this week
- Deep Learning Primitives and Mini-Framework for OpenCL☆204Updated last year
- 8-bit CUDA functions for PyTorch☆68Updated last month
- Onboarding documentation source for the AMD Ryzen™ AI Software Platform. The AMD Ryzen™ AI Software Platform enables developers to take…☆87Updated this week
- The HIP Environment and ROCm Kit - A lightweight open source build system for HIP and ROCm☆563Updated this week
- Linux based GDDR6/GDDR6X VRAM temperature reader for NVIDIA RTX 3000/4000 series GPUs.☆104Updated 6 months ago
- Reverse engineering the rk3588 npu☆99Updated last year
- ☆48Updated 2 years ago
- DLPrimitives/OpenCL out of tree backend for pytorch☆377Updated last year
- Because RKNPU only knows 4D☆39Updated last year
- ☆52Updated last year
- My develoopment fork of llama.cpp. For now working on RK3588 NPU and Tenstorrent backend☆108Updated this week
- Run stable-diffusion-webui with Radeon RX 580 8GB on Ubuntu 22.04.2 LTS☆68Updated 2 years ago
- ROCm docker images with fixes/support for legecy architecture gfx803. eg.Radeon RX 590/RX 580/RX 570/RX 480☆76Updated 5 months ago
- ☆63Updated last year
- GPU Power and Performance Manager☆61Updated last year
- Run LLMs on AMD Ryzen™ AI NPUs in minutes. Just like Ollama - but purpose-built and deeply optimized for the AMD NPUs.☆451Updated this week
- AMD (Radeon GPU) ROCm based setup for popular AI tools on Ubuntu 24.04.1☆216Updated 2 weeks ago
- No-code CLI designed for accelerating ONNX workflows☆216Updated 5 months ago
- Run Large Language Models on RK3588 with GPU-acceleration☆117Updated 2 years ago
- FORK of VLLM for AMD MI25/50/60. A high-throughput and memory-efficient inference and serving engine for LLMs☆65Updated 6 months ago
- A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.☆63Updated 2 years ago
- Prepare for DeekSeek R1 inference: Benchmark CPU, DRAM, SSD, iGPU, GPU, ... with efficient code.☆73Updated 9 months ago
- General Site for the GFX803 ROCm Stuff☆126Updated 2 months ago