nlzy / vllm-gfx906Links
vLLM for AMD gfx906 GPUs, e.g. Radeon VII / MI50 / MI60
☆63Updated last week
Alternatives and similar repositories for vllm-gfx906
Users that are interested in vllm-gfx906 are comparing it to the libraries listed below
Sorting:
- FORK of VLLM for AMD MI25/50/60. A high-throughput and memory-efficient inference and serving engine for LLMs☆49Updated last month
- Triton for AMD MI25/50/60. Development repository for the Triton language and compiler☆26Updated 2 months ago
- run DeepSeek-R1 GGUFs on KTransformers☆234Updated 3 months ago
- 一套基于Vllm的显存内存混合模式大模型部署工具(图形界面),VRAMandDRAM模式虽然慢一点,但是解决了超大模型在普通家用计算机上的部署问题。☆64Updated last month
- llama.cpp fork with additional SOTA quants and improved performance☆519Updated this week
- LM inference server implementation based on *.cpp.☆203Updated last week
- Implements harmful/harmless refusal removal using pure HF Transformers☆852Updated 11 months ago
- KTransformers 一键部署脚本☆45Updated last month
- This project simplifies the installation process of likelovewant's library, making it easier for users to manage and update their AMD GPU…☆169Updated 2 months ago
- ROCm Library Files for gfx1103 and update with others arches based on AMD GPUs for use in Windows.☆514Updated 4 months ago
- Review/Check GGUF files and estimate the memory usage and maximum tokens per second.☆173Updated this week
- The main repository for building Pascal-compatible versions of ML applications and libraries.☆90Updated 2 weeks ago
- High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.☆1,134Updated this week
- A Python package for extending the official PyTorch that can easily obtain performance on Intel platform☆47Updated 5 months ago
- A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations☆40Updated last month
- An optimized quantization and inference library for running LLMs locally on modern consumer-class GPUs☆385Updated this week
- Lightweight Inference server for OpenVINO☆180Updated this week
- Run LLM Agents on Ryzen AI PCs in Minutes☆393Updated this week
- A Pure Rust based LLM (Any LLM based MLLM such as Spark-TTS) Inference Engine, powering by Candle framework.☆120Updated 2 months ago
- ☆36Updated this week
- 使用open-webui中的pipelines技术在open-webui中调用ragflow的agent实现基于知识库的智能对话,并拥有美观的界面。☆76Updated last month
- User-friendly AI Interface (Supports Ollama, OpenAI API, ...)☆266Updated 2 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆51Updated 7 months ago
- The official API server for Exllama. OAI compatible, lightweight, and fast.☆969Updated this week
- Manage GPU clusters for running AI models☆2,813Updated this week
- Model swapping for llama.cpp (or any local OpenAPI compatible server)☆848Updated last week
- Privacy-first agentic framework with powerful reasoning & task automation capabilities. Natively distributed and fully ISO 27XXX complian…☆65Updated 2 months ago
- AMD (Radeon GPU) ROCm based setup for popular AI tools on Ubuntu 24.04.1☆204Updated 3 months ago
- Pure C++ implementation of several models for real-time chatting on your computer (CPU & GPU)☆614Updated last week
- ☆146Updated 2 months ago