SystemPanic / vllm-windowsLinks
A high-throughput and memory-efficient inference and serving engine for LLMs (Windows build & kernels)
☆42Updated last week
Alternatives and similar repositories for vllm-windows
Users that are interested in vllm-windows are comparing it to the libraries listed below
Sorting:
- Development repository for the Triton language and compiler☆34Updated 7 months ago
- ☆50Updated 7 months ago
- ☆112Updated 2 months ago
- Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossi…☆34Updated this week
- An extension to use Kokoro TTS in text generation webui☆20Updated last month
- ☆38Updated 2 months ago
- 8-bit CUDA functions for PyTorch☆25Updated last year
- The hearth of The Pulsar App, fast, secure and shared inference with modern UI☆56Updated 6 months ago
- Interact with a AI Game-engine that keep building its rules and world as you play, adapted to your gameplay.☆45Updated this week
- Cosmos1GP for the GPU Poor by DeepBeepMeep☆66Updated 3 months ago
- This is a pre-built wheel of Triton 3.3.0 for Windows with Nvidia only + Proton☆24Updated 3 weeks ago
- A simple wrapper around "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching" that provides an OpenAI-compatibl…☆13Updated 4 months ago
- Loader extension for tabbyAPI in SillyTavern☆25Updated 10 months ago
- A Lightweight Gradio Web interface for Text-to-Audio Generation utilising SAO1.0☆53Updated 11 months ago
- A simple framework for using a local Koboldcpp LLM to help with story-writing☆21Updated last year
- Memory Management for the GPU Poor, run the latest open source frontier models on consumer Nvidia GPUs☆103Updated last month
- Super simple python connectors for llama.cpp, including vision models (Gemma 3, Qwen2-VL). Compile llama.cpp and run!☆25Updated last month
- 🎙️ Automatically transcribe audio/video into high-quality, speaker-specific Text-To-Speech datasets ✨☆40Updated 2 weeks ago
- Yet another frontend for LLM, written using .NET and WinUI 3☆10Updated 6 months ago
- A system for Prompt generation to improve Text-to-Image performance.☆79Updated 2 months ago
- LCM test nodes for comfyui☆63Updated last year
- Wan2.1, quantized and optimized so it fits on your 3090/4090☆31Updated 3 months ago
- Run Orpheus 3B Locally with Gradio UI, Standalone App☆21Updated 2 months ago
- Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"☆79Updated 7 months ago
- AI Media processing using ComfyUI☆136Updated this week
- ☆49Updated 3 weeks ago
- Gradio UI for YuE☆56Updated 2 months ago
- Create text chunks which end at natural stopping points without using a tokenizer☆25Updated 2 months ago
- A collection of compiled wheels for deepspeed built for python 3.10 and 3.11 with support for cuda 11.8 and 12.1 for Windows☆66Updated 8 months ago
- automatically quant GGUF models☆181Updated this week