SystemPanic / vllm-windowsLinks
A high-throughput and memory-efficient inference and serving engine for LLMs (Windows build & kernels)
☆294Updated 2 months ago
Alternatives and similar repositories for vllm-windows
Users that are interested in vllm-windows are comparing it to the libraries listed below
Sorting:
- ☆48Updated last week
- Service for testing out the new Qwen2.5 omni model☆63Updated 9 months ago
- Deepspeed windows information☆44Updated last year
- ☆128Updated last year
- ☆135Updated 10 months ago
- A collection of compiled wheels for deepspeed built for python 3.10 and 3.11 with support for cuda 11.8 and 12.1 for Windows☆86Updated last year
- stable-diffusion.cpp bindings for python☆97Updated this week
- Lightweight Gradio based WebUI for orpheusTTS - WSL / Linux [CUDA]☆105Updated 2 months ago
- Memory Management for the GPU Poor, run the latest open source frontier models on consumer Nvidia GPUs☆171Updated 2 weeks ago
- Make abliterated models with transformers, easy and fast☆114Updated last month
- automatically quant GGUF models☆219Updated last month
- gguf (GPT-Generated Unified Format) connector☆50Updated 3 weeks ago
- Croco.Cpp is fork of KoboldCPP infering GGML/GGUF models on CPU/Cuda with KoboldAI's UI. It's powered partly by IK_LLama.cpp, and compati…☆154Updated last week
- ☆229Updated 9 months ago
- llama.cpp fork with additional SOTA quants and improved performance☆49Updated last week
- Quantized text-audio foundation model from Boson AI☆43Updated 5 months ago
- SoTA open-source TTS☆150Updated last month
- LM inference server implementation based on *.cpp.☆295Updated 2 months ago
- PyQt6 1st try☆294Updated last year
- API server for VibeVoice☆26Updated 4 months ago
- Python bindings for llama.cpp☆174Updated last week
- ☆51Updated 11 months ago
- Free ComfyUI Workflows☆46Updated last week
- A ComfyUI custom node integration for multi-engine multi-language Text-to-Speech and Voice Conversion. Supports: RVC, Qwen3-TTS, Cozy Voi…☆616Updated this week
- ☆44Updated 11 months ago
- Run Orpheus 3B Locally with Gradio UI, Standalone App☆23Updated 10 months ago
- Docker compose to run vLLM on Windows☆114Updated 2 years ago
- This is a pre-built wheel of Triton 3.3.0 for Windows with Nvidia only + Proton☆40Updated 8 months ago
- A pipeline parallel training script for LLMs.☆166Updated 9 months ago
- ☆51Updated last year