SystemPanic / vllm-windowsLinks
A high-throughput and memory-efficient inference and serving engine for LLMs (Windows build & kernels)
☆229Updated last month
Alternatives and similar repositories for vllm-windows
Users that are interested in vllm-windows are comparing it to the libraries listed below
Sorting:
- Service for testing out the new Qwen2.5 omni model☆61Updated 6 months ago
- Deepspeed windows information☆42Updated last year
- Quantized text-audio foundation model from Boson AI☆41Updated 3 months ago
- ☆42Updated 9 months ago
- ☆124Updated last year
- This is a pre-built wheel of Triton 3.3.0 for Windows with Nvidia only + Proton☆39Updated 6 months ago
- A collection of compiled wheels for deepspeed built for python 3.10 and 3.11 with support for cuda 11.8 and 12.1 for Windows☆77Updated last year
- ☆128Updated 8 months ago
- automatically quant GGUF models☆214Updated 3 weeks ago
- Fast and memory-efficient exact attention - Windows wheels☆36Updated 6 months ago
- llama.cpp fork with additional SOTA quants and improved performance☆35Updated this week
- Wan2.1, quantized and optimized so it fits on your 3090/4090☆34Updated 8 months ago
- ☆43Updated 9 months ago
- Make abliterated models with transformers, easy and fast☆92Updated 7 months ago
- This is a simple node for comfyUI that accesses any openAI API server the user specifies and enables simple text generation with a string…☆27Updated last year
- Free ComfyUI Workflows☆38Updated 2 months ago
- SoTA open-source TTS☆117Updated last month
- For loading and running Pixtral models☆77Updated 9 months ago
- (Windows/Linux/MacOS) Local WebUI with neural network models (Text, Image, Video, 3D, Audio) on python (Gradio interface). Translated on …☆107Updated last week
- OminiControl for the GPU Poor☆39Updated 9 months ago
- ☆50Updated last year
- gguf (GPT-Generated Unified Format) connector☆44Updated this week
- Development repository for the Triton language and compiler☆33Updated last year
- ComfyUI custom node for the VibeVoice TTS. Expressive, long-form, multi-speaker conversational audio☆511Updated last month
- A simple wrapper around "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching" that provides an OpenAI-compatibl…☆15Updated 9 months ago
- 8-bit CUDA functions for PyTorch☆25Updated 2 years ago
- Memory Management for the GPU Poor, run the latest open source frontier models on consumer Nvidia GPUs☆155Updated 3 weeks ago
- ☆223Updated 6 months ago
- ComfyUI Nodes for SongBloom☆189Updated 2 months ago
- Cosmos1GP for the GPU Poor by DeepBeepMeep☆81Updated 9 months ago