nlzy / vllm-gfx906Links
vLLM for AMD gfx906 GPUs, e.g. Radeon VII / MI50 / MI60
☆79Updated 2 weeks ago
Alternatives and similar repositories for vllm-gfx906
Users that are interested in vllm-gfx906 are comparing it to the libraries listed below
Sorting:
- FORK of VLLM for AMD MI25/50/60. A high-throughput and memory-efficient inference and serving engine for LLMs☆50Updated last month
- Triton for AMD MI25/50/60. Development repository for the Triton language and compiler☆26Updated 3 months ago
- KTransformers 一键部署脚本☆47Updated 2 months ago
- LM inference server implementation based on *.cpp.☆226Updated this week
- Review/Check GGUF files and estimate the memory usage and maximum tokens per second.☆177Updated last week
- 一套基于Vllm的显存内存混合模式大模型部署工具(图形界面),VRAMandDRAM模式虽然慢一点,但是解决了超大模型在普通家用计算机上的部署问题。☆69Updated 2 months ago
- run DeepSeek-R1 GGUFs on KTransformers☆236Updated 3 months ago
- Lightweight Inference server for OpenVINO☆187Updated last week
- triton3.2.0添加mi25/mi50/mi60支持☆12Updated 2 months ago
- ROCm Library Files for gfx1103 and update with others arches based on AMD GPUs for use in Windows.☆535Updated 4 months ago
- ☆340Updated 2 months ago
- The main repository for building Pascal-compatible versions of ML applications and libraries.☆95Updated last month
- A Python package for extending the official PyTorch that can easily obtain performance on Intel platform☆47Updated 6 months ago
- LLM voice chat project by Connect ChatTTS with Local Ollama, 连接本地部署的 Ollama 和 ChatTTS,实现和LLM的语音对话☆62Updated 10 months ago
- ☆38Updated last week
- llama.cpp fork with additional SOTA quants and improved performance☆608Updated this week
- Implements harmful/harmless refusal removal using pure HF Transformers☆903Updated last year
- ☆154Updated 3 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆84Updated this week
- An optimized quantization and inference library for running LLMs locally on modern consumer-class GPUs☆408Updated last week
- ☆96Updated last week
- ☆90Updated 3 months ago
- A text-to-speech and speech-to-text server compatible with the OpenAI API, supporting Whisper, FunASR, Bark, and CosyVoice backends.☆124Updated 2 weeks ago
- ☆27Updated last month
- Ollama 模型 Registry 镜像站 / 加速器,让 Ollama 从 ModelScope 魔搭 更快的 拉取 / 下载 模型。☆95Updated 2 months ago
- ☆250Updated 3 weeks ago
- See how to play with ROCm, run it with AMD GPUs!☆31Updated last month
- vGPU-Unlock-patcher☆469Updated 10 months ago
- NVIDIA Linux open GPU with P2P support☆26Updated last month
- CUDA on AMD GPUs☆517Updated last month