Said-Akbar / vllm-rocmLinks
FORK of VLLM for AMD MI25/50/60. A high-throughput and memory-efficient inference and serving engine for LLMs
☆52Updated 2 months ago
Alternatives and similar repositories for vllm-rocm
Users that are interested in vllm-rocm are comparing it to the libraries listed below
Sorting:
- Triton for AMD MI25/50/60. Development repository for the Triton language and compiler☆27Updated 4 months ago
- vLLM for AMD gfx906 GPUs, e.g. Radeon VII / MI50 / MI60☆111Updated last week
- run DeepSeek-R1 GGUFs on KTransformers☆242Updated 4 months ago
- LM inference server implementation based on *.cpp.☆233Updated this week
- LLM inference in C/C++☆78Updated 3 weeks ago
- LLM inference in C/C++☆21Updated 3 months ago
- Review/Check GGUF files and estimate the memory usage and maximum tokens per second.☆185Updated this week
- Implements harmful/harmless refusal removal using pure HF Transformers☆949Updated last year
- automatically quant GGUF models☆187Updated this week
- My develoopment fork of llama.cpp. For now working on RK3588 NPU and Tenstorrent backend☆97Updated 2 weeks ago
- Prepare for DeekSeek R1 inference: Benchmark CPU, DRAM, SSD, iGPU, GPU, ... with efficient code.☆72Updated 5 months ago
- Make PyTorch models at least run on APUs.☆54Updated last year
- Lightweight Inference server for OpenVINO☆188Updated this week
- ☆95Updated 6 months ago
- AMD Ryzen™ AI Software includes the tools and runtime libraries for optimizing and deploying AI inference on AMD Ryzen™ AI powered PCs.☆555Updated last week
- llama.cpp fork with additional SOTA quants and improved performance☆652Updated this week
- The all-in-one RWKV runtime box with embed, RAG, AI agents, and more.☆571Updated last month
- A manual for helping using tesla p40 gpu☆126Updated 8 months ago
- The main repository for building Pascal-compatible versions of ML applications and libraries.☆100Updated last month
- ROCm Library Files for gfx1103 and update with others arches based on AMD GPUs for use in Windows.☆549Updated 5 months ago
- ☆233Updated 2 years ago
- NVIDIA Linux open GPU with P2P support☆25Updated last month
- Fully-featured, beautiful web interface for vLLM - built with NextJS.☆146Updated 2 months ago
- GPU Power and Performance Manager☆60Updated 9 months ago
- A text-to-speech and speech-to-text server compatible with the OpenAI API, supporting Whisper, FunASR, Bark, and CosyVoice backends.☆134Updated last week
- Privacy-first agentic framework with powerful reasoning & task automation capabilities. Natively distributed and fully ISO 27XXX complian…☆65Updated 3 months ago
- KTransformers 一键部署脚本☆48Updated 2 months ago
- Train your own small bitnet model☆74Updated 8 months ago
- AMD (Radeon GPU) ROCm based setup for popular AI tools on Ubuntu 24.04.1☆209Updated 4 months ago
- ☆90Updated last week