vllm-project / recipesLinks

Common recipes to run vLLM

☆245

Alternatives and similar repositories for recipes

Users that are interested in recipes are comparing it to the libraries listed below

Sorting:

vllm-project / speculators
A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM
☆132Updated last week
snowflakedb / ArcticInference
ArcticInference: vLLM plugin for high-throughput, low-latency inference
☆327Updated this week
snowflakedb / ArcticTraining
ArcticTraining is a framework designed to simplify and accelerate the post-training process for large language models (LLMs)
☆254Updated last week
vllm-project / vllm-omni
A framework for efficient model inference with omni-modality models
☆466Updated this week
vllm-project / guidellm
Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs
☆730Updated this week
NVIDIA / Star-Attention
Efficient LLM Inference over Long Sequences
☆392Updated 5 months ago
vllm-project / dashboard
vLLM performance dashboard
☆38Updated last year
MoonshotAI / checkpoint-engine
Checkpoint-engine is a simple middleware to update model weights in LLM inference engines
☆851Updated last week
ovg-project / kvcached
Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond
☆691Updated this week
sgl-project / genai-bench
Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serv…
☆234Updated this week
neuralmagic / AutoFP8
☆205Updated 6 months ago
vllm-project / compressed-tensors
A safetensors extension to efficiently store sparse quantized tensors on disk
☆210Updated 2 weeks ago
triton-inference-server / vllm_backend
☆317Updated last week
sgl-project / SpecForge
Train speculative decoding models effortlessly and port them smoothly to SGLang serving.
☆498Updated last week
radixark / miles
☆317Updated this week
LLM-inference-router / vllm-router
vLLM Router
☆51Updated last year
IsaacRe / vllm-kvcompress
KV cache compression for high-throughput LLM inference
☆145Updated 9 months ago
intel / llm-on-ray
Pretrain, finetune and serve LLMs on Intel platforms with Ray
☆130Updated 2 months ago
NVIDIA-NeMo / Automodel
Pytorch Distributed native training library for LLMs/VLMs with OOTB Hugging Face support
☆187Updated last week
fw-ai / benchmark
Benchmark suite for LLMs from Fireworks.ai
☆84Updated last week
facebookresearch / LayerSkip
Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024
☆347Updated 7 months ago
LMCache / lmcache-vllm
The driver for LMCache core to run in vLLM
☆58Updated 10 months ago
hao-ai-lab / Dynasor
[NeurIPS 2025] Simple extension on vLLM to help you speed up reasoning model without training.
☆209Updated 6 months ago
huggingface / kernels
Load compute kernels from the Hub
☆337Updated last week
huggingface / inference-benchmarker
Inference server benchmarking tool
☆130Updated 2 months ago
Infini-AI-Lab / MagicDec
[ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding
☆132Updated last year
opendatahub-io / vllm-tgis-adapter
vLLM adapter for a TGIS-compatible gRPC server.
☆45Updated this week
OpenBMB / CPM.cu
CPM.cu is a lightweight, high-performance CUDA implementation for LLMs, optimized for end-device inference and featuring cutting-edge tec…
☆205Updated last month
NVIDIA-NeMo / Megatron-Bridge
HuggingFace conversion and training library for Megatron-based models
☆228Updated this week
deepseek-ai / LPLB
An early research stage MoE load balancer based on inear programming.
☆415Updated 2 weeks ago