vllm-project / recipesLinks
Common recipes to run vLLM
☆368Updated this week
Alternatives and similar repositories for recipes
Users that are interested in recipes are comparing it to the libraries listed below
Sorting:
- ArcticInference: vLLM plugin for high-throughput, low-latency inference☆391Updated this week
- A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM☆228Updated this week
- Efficient LLM Inference over Long Sequences☆394Updated 7 months ago
- Checkpoint-engine is a simple middleware to update model weights in LLM inference engines☆902Updated last week
- ArcticTraining is a framework designed to simplify and accelerate the post-training process for large language models (LLMs)☆275Updated this week
- Pytorch Distributed native training library for LLMs/VLMs with OOTB Hugging Face support☆288Updated this week
- Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs☆843Updated this week
- ☆328Updated last week
- Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond☆773Updated this week
- A high-performance and light-weight router for vLLM large scale deployment☆112Updated this week
- Speed Always Wins: A Survey on Efficient Architectures for Large Language Models☆394Updated 3 months ago
- Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serv…☆263Updated last week
- vLLM Router☆54Updated last year
- 🎯An accuracy-first, highly efficient quantization toolkit for LLMs, designed to minimize quality degradation across Weight-Only Quantiza…☆845Updated last week
- Inference server benchmarking tool☆142Updated 4 months ago
- vLLM performance dashboard☆41Updated last year
- vLLM adapter for a TGIS-compatible gRPC server.☆51Updated this week
- Utils for Unsloth https://github.com/unslothai/unsloth☆191Updated this week
- Train speculative decoding models effortlessly and port them smoothly to SGLang serving.☆683Updated this week
- ☆206Updated 9 months ago
- Parallel Scaling Law for Language Model — Beyond Parameter and Inference Time Scaling☆469Updated 8 months ago
- [NeurIPS 2025] Simple extension on vLLM to help you speed up reasoning model without training.☆220Updated 8 months ago
- Materials for learning SGLang☆743Updated last month
- Benchmark suite for LLMs from Fireworks.ai☆89Updated this week
- KV cache compression for high-throughput LLM inference☆153Updated last year
- 🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of O…☆328Updated 4 months ago
- Block Diffusion for Ultra-Fast Speculative Decoding☆533Updated last week
- Miles is an enterprise-facing reinforcement learning framework for LLM and VLM post-training, forked from and co-evolving with slime.☆858Updated this week
- ☆56Updated last year
- A throughput-oriented high-performance serving framework for LLMs☆945Updated 3 months ago