Said-Akbar / vllm-rocmLinks

FORK of VLLM for AMD MI25/50/60. A high-throughput and memory-efficient inference and serving engine for LLMs

☆52

Alternatives and similar repositories for vllm-rocm

Users that are interested in vllm-rocm are comparing it to the libraries listed below

Sorting:

Said-Akbar / triton-gcn5
Triton for AMD MI25/50/60. Development repository for the Triton language and compiler
☆27Updated 4 months ago
nlzy / vllm-gfx906
vLLM for AMD gfx906 GPUs, e.g. Radeon VII / MI50 / MI60
☆111Updated last week
ubergarm / r1-ktransformers-guide
run DeepSeek-R1 GGUFs on KTransformers
☆242Updated 4 months ago
gpustack / llama-box
LM inference server implementation based on *.cpp.
☆233Updated this week
unslothai / llama.cpp
LLM inference in C/C++
☆78Updated 3 weeks ago
fairydreaming / llama.cpp
LLM inference in C/C++
☆21Updated 3 months ago
gpustack / gguf-parser-go
Review/Check GGUF files and estimate the memory usage and maximum tokens per second.
☆185Updated this week
Sumandora / remove-refusals-with-transformers
Implements harmful/harmless refusal removal using pure HF Transformers
☆949Updated last year
leafspark / AutoGGUF
automatically quant GGUF models
☆187Updated this week
marty1885 / llama.cpp
My develoopment fork of llama.cpp. For now working on RK3588 NPU and Tenstorrent backend
☆97Updated 2 weeks ago
BlinkDL / fast.c
Prepare for DeekSeek R1 inference: Benchmark CPU, DRAM, SSD, iGPU, GPU, ... with efficient code.
☆72Updated 5 months ago
pomoke / torch-apu-helper
Make PyTorch models at least run on APUs.
☆54Updated last year
SearchSavior / OpenArc
Lightweight Inference server for OpenVINO
☆188Updated this week
chigkim / Ollama-MMLU-Pro
☆95Updated 6 months ago
amd / RyzenAI-SW
AMD Ryzen™ AI Software includes the tools and runtime libraries for optimizing and deploying AI inference on AMD Ryzen™ AI powered PCs.
☆555Updated last week
ikawrakow / ik_llama.cpp
llama.cpp fork with additional SOTA quants and improved performance
☆652Updated this week
Ai00-X / ai00_server
The all-in-one RWKV runtime box with embed, RAG, AI agents, and more.
☆571Updated last month
JingShing / How-to-use-tesla-p40
A manual for helping using tesla p40 gpu
☆126Updated 8 months ago
sasha0552 / pascal-pkgs-ci
The main repository for building Pascal-compatible versions of ML applications and libraries.
☆100Updated last month
likelovewant / ROCmLibs-for-gfx1103-AMD780M-APU
ROCm Library Files for gfx1103 and update with others arches based on AMD GPUs for use in Windows.
☆549Updated 5 months ago
xuhuisheng / rocm-gfx803
☆233Updated 2 years ago
aikitoria / open-gpu-kernel-modules
NVIDIA Linux open GPU with P2P support
☆25Updated last month
yoziru / nextjs-vllm-ui
Fully-featured, beautiful web interface for vLLM - built with NextJS.
☆146Updated 2 months ago
crashr / gppm
GPU Power and Performance Manager
☆60Updated 9 months ago
gpustack / vox-box
A text-to-speech and speech-to-text server compatible with the OpenAI API, supporting Whisper, FunASR, Bark, and CosyVoice backends.
☆134Updated last week
Independent-AI-Labs / local-super-agents
Privacy-first agentic framework with powerful reasoning & task automation capabilities. Natively distributed and fully ISO 27XXX complian…
☆65Updated 3 months ago
maaaxinfinity / ktrun
KTransformers 一键部署脚本
☆48Updated 2 months ago
pranavjad / tinyllama-bitnet
Train your own small bitnet model
☆74Updated 8 months ago
nktice / AMD-AI
AMD (Radeon GPU) ROCm based setup for popular AI tools on Ubuntu 24.04.1
☆209Updated 4 months ago
and270 / thinking_effort_processor
☆90Updated last week