SystemPanic / vllm-windowsLinks
A high-throughput and memory-efficient inference and serving engine for LLMs (Windows build & kernels)
☆262Updated last month
Alternatives and similar repositories for vllm-windows
Users that are interested in vllm-windows are comparing it to the libraries listed below
Sorting:
- ☆127Updated last year
- Service for testing out the new Qwen2.5 omni model☆61Updated 8 months ago
- ☆44Updated 10 months ago
- ☆135Updated 9 months ago
- automatically quant GGUF models☆218Updated last week
- Quantized text-audio foundation model from Boson AI☆41Updated 4 months ago
- Development repository for the Triton language and compiler☆34Updated last year
- Input your VRAM and RAM and the toolchain will produce a GGUF model tuned to your system within seconds — flexible model sizing and lowes…☆71Updated this week
- ☆44Updated 10 months ago
- Croco.Cpp is fork of KoboldCPP infering GGML/GGUF models on CPU/Cuda with KoboldAI's UI. It's powered partly by IK_LLama.cpp, and compati…☆154Updated last week
- Deepspeed windows information☆44Updated last year
- SoTA open-source TTS☆137Updated 2 weeks ago
- ☆51Updated last year
- llama.cpp fork with additional SOTA quants and improved performance☆41Updated this week
- Make abliterated models with transformers, easy and fast☆111Updated 3 weeks ago
- This is a pre-built wheel of Triton 3.3.0 for Windows with Nvidia only + Proton☆40Updated 7 months ago
- ACE-Step: A Step Towards Music Generation Foundation Model☆46Updated 7 months ago
- Docker compose to run vLLM on Windows☆112Updated 2 years ago
- PyQt6 1st try☆293Updated 11 months ago
- Game Companion AI is an advanced application designed to enhance the gaming experience by providing real-time analysis and interpretation…☆53Updated last year
- A pipeline parallel training script for LLMs.☆164Updated 8 months ago
- Lightweight Gradio based WebUI for orpheusTTS - WSL / Linux [CUDA]☆105Updated last month
- stable-diffusion.cpp bindings for python☆87Updated 2 weeks ago
- Wan2.1, quantized and optimized so it fits on your 3090/4090☆34Updated 10 months ago
- Frontier Open-Source Text-to-Speech☆95Updated 4 months ago
- Fast and memory-efficient exact attention - Windows wheels☆36Updated 8 months ago
- Orpheus Chat WebUI☆74Updated 9 months ago
- ☆51Updated 10 months ago
- A collection of compiled wheels for deepspeed built for python 3.10 and 3.11 with support for cuda 11.8 and 12.1 for Windows☆83Updated last year
- Unofficial WIP LoRa Finetuning repository for VibeVoice☆302Updated 3 months ago