SystemPanic / vllm-windowsLinks
A high-throughput and memory-efficient inference and serving engine for LLMs (Windows build & kernels)
☆294Updated 2 months ago
Alternatives and similar repositories for vllm-windows
Users that are interested in vllm-windows are comparing it to the libraries listed below
Sorting:
- Deepspeed windows information☆44Updated last year
- ☆135Updated 10 months ago
- ☆49Updated last week
- Service for testing out the new Qwen2.5 omni model☆63Updated 9 months ago
- automatically quant GGUF models☆219Updated last month
- Memory Management for the GPU Poor, run the latest open source frontier models on consumer Nvidia GPUs☆171Updated 3 weeks ago
- ☆44Updated last year
- A collection of compiled wheels for deepspeed built for python 3.10 and 3.11 with support for cuda 11.8 and 12.1 for Windows☆86Updated last year
- ☆128Updated last year
- Croco.Cpp is fork of KoboldCPP infering GGML/GGUF models on CPU/Cuda with KoboldAI's UI. It's powered partly by IK_LLama.cpp, and compati…☆156Updated this week
- stable-diffusion.cpp bindings for python☆97Updated this week
- Quantized text-audio foundation model from Boson AI☆43Updated 5 months ago
- This is a pre-built wheel of Triton 3.3.0 for Windows with Nvidia only + Proton☆40Updated 8 months ago
- SoTA open-source TTS☆150Updated last month
- Docker compose to run vLLM on Windows☆114Updated 2 years ago
- OminiControl for the GPU Poor☆39Updated last year
- Make abliterated models with transformers, easy and fast☆114Updated 2 months ago
- gguf (GPT-Generated Unified Format) connector☆50Updated 3 weeks ago
- Fast and memory-efficient exact attention - Windows wheels☆36Updated 9 months ago
- A ComfyUI custom node integration for multi-engine multi-language Text-to-Speech and Voice Conversion. Supports: RVC, Qwen3-TTS, Cozy Voi…☆616Updated last week
- ☆230Updated 9 months ago
- Wan2.1, quantized and optimized so it fits on your 3090/4090☆34Updated 11 months ago
- PyQt6 1st try☆295Updated last year
- Gradio UI for YuE☆89Updated 10 months ago
- 8-bit CUDA functions for PyTorch☆26Updated 2 years ago
- Attempts to bypass AI Image detection by employing various methods such as: Noise Injection, FFT Smoothing, FFT Matching, Pixel Perturbat…☆193Updated 4 months ago
- Lightweight Gradio based WebUI for orpheusTTS - WSL / Linux [CUDA]☆105Updated 2 months ago
- API server for VibeVoice☆26Updated 4 months ago
- Cosmos1GP for the GPU Poor by DeepBeepMeep☆81Updated 11 months ago
- ik_llama.cpp's Thireus fork with release builds for macOS/Windows/Ubuntu CPU and Windows CUDA☆53Updated this week