iacopPBK / llama.cpp-gfx906Links

llama.cpp-gfx906

☆49

Alternatives and similar repositories for llama.cpp-gfx906

Users that are interested in llama.cpp-gfx906 are comparing it to the libraries listed below

Sorting:

mixa3607 / ML-gfx906
ML software (llama.cpp, ComfyUI, vLLM) builds for AMD gfx906 GPUs, e.g. Radeon VII / MI50 / MI60
☆53Updated last week
SearchSavior / OpenArc
Inference engine for Intel devices. Serve LLMs, VLMs, Whisper, Kokoro-TTS, Embedding and Rerank models over OpenAI endpoints.
☆241Updated last week
nktice / AMD-AI
AMD (Radeon GPU) ROCm based setup for popular AI tools on Ubuntu 24.04.1
☆216Updated 2 weeks ago
nlzy / vllm-gfx906
vLLM for AMD gfx906 GPUs, e.g. Radeon VII / MI50 / MI60
☆327Updated last month
theroyallab / YALS
☆85Updated last week
FastFlowLM / FastFlowLM
Run LLMs on AMD Ryzen™ AI NPUs in minutes. Just like Ollama - but purpose-built and deeply optimized for the AMD NPUs.
☆451Updated this week
stduhpf / stable-diffusion.cpp
Stable Diffusion and Flux in pure C/C++
☆22Updated this week
ikawrakow / ik_llama.cpp
llama.cpp fork with additional SOTA quants and improved performance
☆1,329Updated this week
perk11 / large-model-proxy
Run multiple resource-heavy Large Models (LM) on the same machine with limited amount of VRAM/other resources by exposing them on differe…
☆83Updated 3 weeks ago
anthonix / llm.c
LLM training in simple, raw C/HIP for AMD GPUs
☆54Updated last year
Thireus / GGUF-Tool-Suite
Input your VRAM and RAM and the toolchain will produce a GGUF model tuned to your system within seconds — flexible model sizing and lowes…
☆63Updated this week
lemonade-sdk / llamacpp-rocm
Fresh builds of llama.cpp with AMD ROCm™ 7 acceleration
☆103Updated this week
turboderp-org / exllamav3
An optimized quantization and inference library for running LLMs locally on modern consumer-class GPUs
☆571Updated last week
ROCm / TheRock
The HIP Environment and ROCm Kit - A lightweight open source build system for HIP and ROCm
☆563Updated this week
inferx-net / inferx
InferX: Inference as a Service Platform
☆138Updated this week
Said-Akbar / vllm-rocm
FORK of VLLM for AMD MI25/50/60. A high-throughput and memory-efficient inference and serving engine for LLMs
☆65Updated 6 months ago
reinterpretcat / qwen3-rs
An educational Rust project for exporting and running inference on Qwen3 LLM family
☆34Updated 3 months ago
ubergarm / ik_llama.cpp
llama.cpp fork with additional SOTA quants and improved performance
☆21Updated this week
amd / RyzenAI-SW
AMD Ryzen™ AI Software includes the tools and runtime libraries for optimizing and deploying AI inference on AMD Ryzen™ AI powered PCs.
☆692Updated 2 weeks ago
onnx / turnkeyml
No-code CLI designed for accelerating ONNX workflows
☆216Updated 5 months ago
ROCm / flash-attention
Fast and memory-efficient exact attention
☆200Updated last month
rafacelente / bllama
1.58-bit LLaMa model
☆83Updated last year
matt-c1 / llama-3-quant-comparison
Comparison of the output quality of quantization methods, using Llama 3, transformers, GGUF, EXL2.
☆165Updated last year
adriancable / qwen3.c
Local Qwen3 LLM inference. One easy-to-understand file of C source with no dependencies.
☆145Updated 4 months ago
ROCm / bitsandbytes
8-bit CUDA functions for PyTorch
☆68Updated last month
gigit0000 / qwen3.c
Lightweight C inference for Qwen3 GGUF. Multiturn prefix caching & batch processing.
☆17Updated 2 months ago
Cornell-RelaxML / qtip
☆154Updated 4 months ago
NimbleEdge / sparse_transformers
Sparse Inferencing for transformer based LLMs
☆208Updated 3 months ago
aikitoria / open-gpu-kernel-modules
NVIDIA Linux open GPU with P2P support
☆78Updated 2 weeks ago
amd / fuzzyHSA
☆52Updated last year