A high-throughput and memory-efficient inference and serving engine for LLMs (Windows build & kernels)
☆546Jun 15, 2026Updated 2 weeks ago
Alternatives and similar repositories for vllm-windows
Users that are interested in vllm-windows are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆22Nov 28, 2025Updated 7 months ago
- Deepspeed windows information☆44Mar 9, 2024Updated 2 years ago
- One-click Qwen3.6-27B inference on Windows. 158 tok/s on RTX 5090, 72 tok/s on RTX 3090. Native, no WSL, no Docker, no telemetry.☆212May 14, 2026Updated last month
- Advanced CLI diffusion inference/training suite based on Musubi Tuner☆40Apr 15, 2026Updated 2 months ago
- Fork of the Triton language and compiler for Windows support and easy installation☆1,945Feb 18, 2026Updated 4 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- ☆18Dec 2, 2024Updated last year
- Custom LM Studio backends — run on legacy CPUs and Vulkan GPUs. AVX1, experimental no-AVX, and more. We were told it wouldn't work… so we…☆74Jun 11, 2026Updated 2 weeks ago
- A repo to quantize diffusion models directly in ComfyUI☆117Jun 14, 2025Updated last year
- A video clipper for Hunyuan video training.☆87Jul 28, 2025Updated 11 months ago
- [ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-t…☆68Jan 19, 2026Updated 5 months ago
- ☆73Updated this week
- Fork of ACE-Step v1.0 for LoRA training with < 10 GB VRAM☆69Feb 3, 2026Updated 4 months ago
- An Extension for Automatic1111 Webui that makes the interface easier to use on mobile (portrait)☆17Apr 16, 2024Updated 2 years ago
- Fine-tuning code for CLIP models☆275Jun 9, 2026Updated 3 weeks ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Web application for fine-tuning language models☆16Aug 16, 2025Updated 10 months ago
- ☆67Nov 10, 2025Updated 7 months ago
- [DEPRECATED] Attempts to convert a Flux lora to a Chroma lora☆21Nov 9, 2025Updated 7 months ago
- A comprehensive codebase for training and finetuning Image <> Latent models.☆50Mar 1, 2025Updated last year
- Official code for SA-Solver: Stochastic Adams Solver for Fast Sampling of Diffusion Models (NeurIPS 2023)☆14Mar 4, 2024Updated 2 years ago
- Sparse Autoencoders (SAE) vs CLIP fine-tuning fun.☆18Dec 19, 2024Updated last year
- Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossi…☆147Jun 18, 2026Updated last week
- OpenCode Sentinel is a security-enhanced version of OpenCode, it allows you to connect only to private AI servers, completely cutting off…☆36Feb 3, 2026Updated 4 months ago
- ☆29Jun 6, 2026Updated 3 weeks ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Extension for Forge-based UIs (Forge, reForge, etc) and ComfyUI to replace CFG with Negative Rejection Steering☆16May 16, 2026Updated last month
- Pre-compiled Python whl for Flash-attention, SageAttention, NATTEN, xFormer etc☆699Jun 12, 2026Updated 2 weeks ago
- Croco.Cpp is fork of KoboldCPP infering GGML/GGUF models on CPU/Cuda with KoboldAI's UI. It's powered partly by IK_LLama.cpp, and compati…☆174Jun 10, 2026Updated 3 weeks ago
- Performs the entire AI cover generation process with UI☆30Aug 4, 2025Updated 10 months ago
- ☆49Jan 19, 2026Updated 5 months ago
- Extension/Script for Stable Diffusion UI by AUTOMATIC1111 https://github.com/AUTOMATIC1111/stable-diffusion-webui☆19Feb 10, 2023Updated 3 years ago
- React hook for getting the device pixel ratio and reacting to changes☆12Apr 15, 2024Updated 2 years ago
- stable-diffusion-webui-images-browser☆14Jan 12, 2023Updated 3 years ago
- LLM Benchmark Using Project Euler For Coding Challenges☆15Updated this week
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- Money tracking - Android App for planning, tracking your spending, monitoring your credit and budget - UNIBO 2016/2017☆12Sep 14, 2017Updated 8 years ago
- 🎬 把 PPT 变成电影级网页演示 | Transform slides into cinematic web showcases☆77Mar 29, 2026Updated 3 months ago
- ☆16Jun 6, 2025Updated last year
- GUI-focused roop☆15Mar 6, 2025Updated last year
- Code for my collection of predictors/classifiers/etc☆14Jul 18, 2024Updated last year
- HDM model loader for ComfyUI☆42Dec 14, 2025Updated 6 months ago
- Your AI Soul Companion. Self-hosted AI agent across 30+ messaging channels It can not only serve as an emotional companion in daily life …☆51Jun 4, 2026Updated 3 weeks ago