SystemPanic / vllm-windowsLinks
A high-throughput and memory-efficient inference and serving engine for LLMs (Windows build & kernels)
☆65Updated last week
Alternatives and similar repositories for vllm-windows
Users that are interested in vllm-windows are comparing it to the libraries listed below
Sorting:
- Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossi…☆39Updated 3 weeks ago
- Wan2.1, quantized and optimized so it fits on your 3090/4090☆31Updated 4 months ago
- ☆50Updated 7 months ago
- This is a pre-built wheel of Triton 3.3.0 for Windows with Nvidia only + Proton☆27Updated last month
- Cosmos1GP for the GPU Poor by DeepBeepMeep☆65Updated 4 months ago
- ☆39Updated 3 months ago
- Development repository for the Triton language and compiler☆34Updated 8 months ago
- 8-bit CUDA functions for PyTorch☆25Updated last year
- A simple wrapper around "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching" that provides an OpenAI-compatibl…☆13Updated 4 months ago
- ☆55Updated 7 months ago
- ☆52Updated last month
- A lightweight cluster manager that turns your small fleet of nodes into one powerful computer, using Docker for environment consistency w…☆51Updated last week
- Makes your prompts better both Locally & Online, UI & NO UI☆42Updated 8 months ago
- ☆114Updated 3 months ago
- Run Orpheus 3B Locally with Gradio UI, Standalone App☆22Updated 2 months ago
- ☆36Updated 4 months ago
- This extension enhances the capabilities of textgen-webui by integrating advanced vision models, allowing users to have contextualized co…☆54Updated 8 months ago
- ☆22Updated last year
- A pipeline parallel training script for LLMs.☆150Updated last month
- ☆22Updated 8 months ago
- Bridging wrapper for llama-cpp-python within ComfyUI☆59Updated 11 months ago
- ☆19Updated last year
- Python script that converts PyTorch pth and pt files to safetensors format☆23Updated 2 months ago
- Super simple python connectors for llama.cpp, including vision models (Gemma 3, Qwen2-VL). Compile llama.cpp and run!☆25Updated last month
- This is a simple node for comfyUI that accesses any openAI API server the user specifies and enables simple text generation with a string…☆28Updated last year
- Interact with a AI Game-engine that keep building its rules and world as you play, adapted to your gameplay.☆46Updated 2 weeks ago
- Gradio UI for YuE☆60Updated 2 months ago
- ☆23Updated 8 months ago
- Fast and memory-efficient exact attention - Windows wheels☆38Updated last month
- Memory Management for the GPU Poor, run the latest open source frontier models on consumer Nvidia GPUs☆113Updated 2 weeks ago