Said-Akbar / vllm-rocmLinks
FORK of VLLM for AMD MI25/50/60. A high-throughput and memory-efficient inference and serving engine for LLMs
☆49Updated last month
Alternatives and similar repositories for vllm-rocm
Users that are interested in vllm-rocm are comparing it to the libraries listed below
Sorting:
- Triton for AMD MI25/50/60. Development repository for the Triton language and compiler☆26Updated 2 months ago
- vLLM for AMD gfx906 GPUs, e.g. Radeon VII / MI50 / MI60☆63Updated last week
- LLM inference in C/C++☆21Updated 2 months ago
- A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations☆40Updated last month
- 一套基于Vllm的显存内存混合模式大模型部署工具(图形界面),VRAMandDRAM模式虽然慢一点,但是解决了超大模型在普通家用计算机上的部署问题。☆64Updated last month
- run DeepSeek-R1 GGUFs on KTransformers☆234Updated 3 months ago
- ☆89Updated 2 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆79Updated this week
- ☆120Updated 2 weeks ago
- GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型☆26Updated last month
- Make PyTorch models at least run on APUs.☆55Updated last year
- llama.cpp fork with additional SOTA quants and improved performance☆519Updated this week
- ☆36Updated this week
- ROCm Library Files for gfx1103 and update with others arches based on AMD GPUs for use in Windows.☆514Updated 4 months ago
- build scripts for ROCm☆186Updated last year
- Minimal Linux OS with a Model Context Protocol (MCP) gateway to expose local capabilities to LLMs.☆228Updated last week
- This project simplifies the installation process of likelovewant's library, making it easier for users to manage and update their AMD GPU…☆169Updated 2 months ago
- This is the Mixture-of-Agents (MoA) concept, adapted from the original work by TogetherAI. My version is tailored for local model usage a…☆116Updated 11 months ago
- Privacy-first agentic framework with powerful reasoning & task automation capabilities. Natively distributed and fully ISO 27XXX complian…☆65Updated 2 months ago
- KoboldCpp Smart Launcher with GPU Layer and Tensor Override Tuning☆22Updated 2 weeks ago
- User-friendly AI Interface (Supports Ollama, OpenAI API, ...)☆266Updated 2 months ago
- ☆90Updated 5 months ago
- Prepare for DeekSeek R1 inference: Benchmark CPU, DRAM, SSD, iGPU, GPU, ... with efficient code.☆72Updated 4 months ago
- NVIDIA Linux open GPU with P2P support☆25Updated 2 weeks ago
- InferX is a Inference Function as a Service Platform☆106Updated this week
- LLM inference in C/C++☆77Updated 3 weeks ago
- Implements harmful/harmless refusal removal using pure HF Transformers☆852Updated 11 months ago
- Lightweight Inference server for OpenVINO☆180Updated this week
- automatically quant GGUF models☆181Updated this week
- LM inference server implementation based on *.cpp.☆203Updated last week