A high-throughput and memory-efficient inference and serving engine for LLMs
☆43Apr 27, 2026Updated last week
Alternatives and similar repositories for vllm
Users that are interested in vllm are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆26Apr 7, 2026Updated 3 weeks ago
- ☆26Oct 2, 2023Updated 2 years ago
- Large language models designed for formal theorem proving through tool-integrated reasoning.☆34Aug 13, 2025Updated 8 months ago
- Unofficial implementation of DreamTalk in ComfyUI☆12Aug 15, 2024Updated last year
- A flash implementation of hash join in C++☆32Sep 15, 2025Updated 7 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Service for quickly aliasing and redirecting to long URLs☆25Apr 26, 2023Updated 3 years ago
- Fast and memory-efficient exact attention☆22Apr 10, 2026Updated 3 weeks ago
- ☆12Sep 28, 2025Updated 7 months ago
- Test-time adaptation for speech recognition model by single utterance. The official implementation of "Listen, Adapt, Better WER: Source-…☆22Apr 1, 2022Updated 4 years ago
- AgentParse is a high-performance parsing library designed to map various structured data formats (such as Pydantic models, JSON, YAML, an…☆18Oct 13, 2025Updated 6 months ago
- SGLang Kernel Wheel Index☆22Updated this week
- ☆15May 17, 2024Updated last year
- Depict GPU memory footprint during DNN training of PyTorch☆11Nov 17, 2022Updated 3 years ago
- Run AuraFlow on Replicate☆14Jul 12, 2024Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- A tiny server to run local inference on MLX model in the style of OpenAI☆13Jan 31, 2024Updated 2 years ago
- ☆11Aug 22, 2023Updated 2 years ago
- ☆50Jan 27, 2025Updated last year
- ☆12Sep 1, 2023Updated 2 years ago
- An open source community implementation of the model MELLE from the paper: "Autoregressive Speech Synthesis without Vector Quantization"☆14Apr 13, 2026Updated 3 weeks ago
- ☆12Mar 13, 2023Updated 3 years ago
- ☆11May 13, 2024Updated last year
- ComfyUI nodes to use DiLightNet☆11Oct 6, 2024Updated last year
- ☆66Apr 26, 2025Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling☆22Updated this week
- ☕️ A vscode extension for netron, support *.pdmodel, *.nb, *.onnx, *.pb, *.h5, *.tflite, *.pth, *.pt, *.mnn, *.param, etc.☆14Jun 4, 2023Updated 2 years ago
- A TypeScript Attester using Turnstile for the Privacy Pass Authentication Protocol☆15Apr 24, 2026Updated last week
- This repo holds the research projects of our lab.☆11Jan 20, 2024Updated 2 years ago
- A bookshelf plugin which handles relationships.☆22Updated this week
- KV cache compression for high-throughput LLM inference☆157Feb 5, 2025Updated last year
- Coder Desktop application for Windows☆24Feb 24, 2026Updated 2 months ago
- ☆17Aug 5, 2025Updated 9 months ago
- Pyodide is a Python distribution for the browser and Node.js based on WebAssembly☆17Apr 23, 2026Updated last week
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Sequence-level 1F1B schedule for LLMs.☆19Jun 4, 2024Updated last year
- ☆13Mar 27, 2023Updated 3 years ago
- YUV.AI Developers AI Trends - Beautiful Gen AI & ML news aggregator with Apple-inspired design☆101Jan 4, 2026Updated 4 months ago
- Triton adapter for Ascend. Mirror of https://gitcode.com/ascend/triton-ascend☆120Updated this week
- cutile kernel examples☆48Apr 3, 2026Updated last month
- Alibaba Cloud's high-performance KVCache system for LLM inference, with components for global cache management, inference simulation(HiSi…☆159Updated this week
- ☆21Feb 2, 2024Updated 2 years ago