Said-Akbar / vllm-rocmLinks
FORK of VLLM for AMD MI25/50/60. A high-throughput and memory-efficient inference and serving engine for LLMs
☆65Updated 8 months ago
Alternatives and similar repositories for vllm-rocm
Users that are interested in vllm-rocm are comparing it to the libraries listed below
Sorting:
- Triton for AMD MI25/50/60. Development repository for the Triton language and compiler☆32Updated last month
- vLLM for AMD gfx906 GPUs, e.g. Radeon VII / MI50 / MI60☆367Updated last month
- ML software (llama.cpp, ComfyUI, vLLM) builds for AMD gfx906 GPUs, e.g. Radeon VII / MI50 / MI60☆107Updated 2 months ago
- LM inference server implementation based on *.cpp.☆294Updated 2 months ago
- Review/Check GGUF files and estimate the memory usage and maximum tokens per second.☆236Updated 3 weeks ago
- Inference engine for Intel devices. Serve LLMs, VLMs, Whisper, Kokoro-TTS, Embedding and Rerank models over OpenAI endpoints.☆290Updated this week
- The main repository for building Pascal-compatible versions of ML applications and libraries.☆166Updated 5 months ago
- llama.cpp-gfx906☆87Updated 2 weeks ago
- Fresh builds of llama.cpp with AMD ROCm™ 7 acceleration☆176Updated this week
- Run multiple resource-heavy Large Models (LM) on the same machine with limited amount of VRAM/other resources by exposing them on differe…☆88Updated this week
- GPU Power and Performance Manager☆66Updated last year
- Download models from the Ollama library, without Ollama☆121Updated last year
- LLM inference in C/C++☆104Updated this week
- Run LLMs on AMD Ryzen™ AI NPUs in minutes. Just like Ollama - but purpose-built and deeply optimized for the AMD NPUs.☆689Updated this week
- triton for AMD gfx906 GPUs, e.g. Radeon VII / MI50 / MI60☆40Updated last month
- run DeepSeek-R1 GGUFs on KTransformers☆260Updated 10 months ago
- Running SXM2/SXM3/SXM4 NVidia data center GPUs in consumer PCs☆138Updated 2 years ago
- automatically quant GGUF models☆219Updated last month
- Produce your own Dynamic 3.0 Quants and achieve optimum accuracy & SOTA quantization performance! Input your VRAM and RAM and the toolcha…☆76Updated this week
- A manual for helping using tesla p40 gpu☆142Updated last year
- ☆109Updated 5 months ago
- Docker compose to run vLLM on Windows☆114Updated 2 years ago
- NVIDIA Linux open GPU with P2P support☆126Updated last month
- The HIP Environment and ROCm Kit - A lightweight open source build system for HIP and ROCm☆121Updated 2 weeks ago
- whisper-cpp-serve Real-time speech recognition and c+ of OpenAI's Whisper model in C/C++☆72Updated last year
- Code execution utilities for Open WebUI & Ollama☆318Updated last year
- A fork of vLLM enabling Pascal architecture GPUs☆32Updated 11 months ago
- A text-to-speech and speech-to-text server compatible with the OpenAI API, supporting Whisper, FunASR, Bark, and CosyVoice backends.☆192Updated last month
- InferX: Inference as a Service Platform☆154Updated this week
- No-code CLI designed for accelerating ONNX workflows☆226Updated 7 months ago