SystemPanic / vllm-windows
A high-throughput and memory-efficient inference and serving engine for LLMs (Windows build & kernels)
☆37Updated 2 weeks ago
Alternatives and similar repositories for vllm-windows
Users that are interested in vllm-windows are comparing it to the libraries listed below
Sorting:
- ☆48Updated 6 months ago
- Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossi…☆30Updated 3 weeks ago
- This is a pre-built wheel of Triton 3.3.0 for Windows with Nvidia only + Proton☆15Updated this week
- Run Orpheus 3B Locally with Gradio UI, Standalone App☆20Updated last month
- Fast and memory-efficient exact attention - Windows wheels☆38Updated 2 weeks ago
- A Lightweight Gradio Web interface for Text-to-Audio Generation utilising SAO1.0☆51Updated 11 months ago
- Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"☆78Updated 7 months ago
- ☆108Updated 2 months ago
- 8-bit CUDA functions for PyTorch☆25Updated last year
- A collection of compiled wheels for deepspeed built for python 3.10 and 3.11 with support for cuda 11.8 and 12.1 for Windows☆65Updated 8 months ago
- An unsupervised model merging algorithm for Transformers-based language models.☆105Updated last year
- Loader extension for tabbyAPI in SillyTavern☆24Updated 9 months ago
- Cosmos1GP for the GPU Poor by DeepBeepMeep☆64Updated 3 months ago
- AI Media processing using ComfyUI☆131Updated last week
- Interact with a AI Game-engine that keep building its rules and world as you play, adapted to your gameplay.☆44Updated this week
- ☆35Updated last week
- ExLlamaV2 nodes for ComfyUI.☆118Updated 5 months ago
- Service for testing out the new Qwen2.5 omni model☆48Updated 2 weeks ago
- Bridging wrapper for llama-cpp-python within ComfyUI☆57Updated 10 months ago
- A system for Prompt generation to improve Text-to-Image performance.☆78Updated 2 months ago
- ☆52Updated 6 months ago
- ☆35Updated 3 months ago
- Wan2.1, quantized and optimized so it fits on your 3090/4090☆31Updated 2 months ago
- For loading and running Pixtral models☆76Updated 3 months ago
- ACE-Step: A Step Towards Music Generation Foundation Model☆30Updated this week
- Testbed for the fastest SD pipelines☆35Updated last year
- Fast and memory-efficient exact attention - Windows wheels☆33Updated last year
- CLIP Interrogator, fully in HuggingFace Transformers 🤗, with LongCLIP & CLIP's own words and / or *your* own words!☆16Updated 3 months ago
- A Windows tool to query various LLM AIs. Supports branched conversations, history and summaries among others.☆30Updated this week
- Diffusion_TTS extension for booga☆67Updated 10 months ago