vllm-project / vllm-project.github.ioLinks

☆23

Alternatives and similar repositories for vllm-project.github.io

Users that are interested in vllm-project.github.io are comparing it to the libraries listed below

Sorting:

sgl-project / ome
OME is a Kubernetes operator for enterprise-grade management and serving of Large Language Models (LLMs)
☆307Updated last week
intel / llm-on-ray
Pretrain, finetune and serve LLMs on Intel platforms with Ray
☆131Updated last month
AI-Hypercomputer / inference-benchmark
☆17Updated 4 months ago
MoonshotAI / checkpoint-engine
Checkpoint-engine is a simple middleware to update model weights in LLM inference engines
☆820Updated this week
vllm-project / recipes
Common recipes to run vLLM
☆224Updated this week
ovg-project / kvcached
Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond
☆628Updated last week
sgl-project / genai-bench
Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serv…
☆229Updated last week
sgl-project / rbg
A workload for deploying LLM inference services on Kubernetes
☆99Updated last week
tensorchord / deepseek-api-arena
A benchmarking tool for comparing different LLM API providers' DeepSeek model deployments.
☆30Updated 7 months ago
InftyAI / Awesome-LLMOps
🎉 An awesome & curated list of best LLMOps tools.
☆167Updated last month
run-ai / runai-model-streamer
☆267Updated this week
ai-dynamo / nixl
NVIDIA Inference Xfer Library (NIXL)
☆712Updated this week
leptonai / gpud
GPUd automates monitoring, diagnostics, and issue identification for GPUs
☆450Updated this week
AlibabaPAI / llumnix
Efficient and easy multi-instance LLM serving
☆510Updated 2 months ago
meta-pytorch / torchforge
PyTorch-native post-training at scale
☆509Updated this week
vllm-project / guidellm
Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs
☆700Updated this week
ServerlessLLM / ServerlessLLM
Serverless LLM Serving for Everyone.
☆585Updated last week
snowflakedb / ArcticInference
ArcticInference: vLLM plugin for high-throughput, low-latency inference
☆299Updated this week
perplexityai / pplx-kernels
Perplexity GPU Kernels
☆528Updated last week
coreweave / nccl-tests
NVIDIA NCCL Tests for Distributed Training
☆123Updated this week
NVIDIA / kvpress
LLM KV cache compression made easy
☆680Updated this week
vllm-project / speculators
A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM
☆65Updated this week
ai-dynamo / aiconfigurator
Offline optimization of your disaggregated Dynamo graph
☆105Updated this week
LMCache / lmcache-vllm
The driver for LMCache core to run in vLLM
☆56Updated 9 months ago
volcengine / veScale
A PyTorch Native LLM Training Framework
☆884Updated 2 months ago
imbue-ai / cluster-health
☆316Updated last year
llm-d / llm-d-kv-cache-manager
Distributed KV cache coordinator
☆85Updated this week
meta-pytorch / torchft
Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)
☆446Updated last week
AI-Hypercomputer / JetStream
JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs wel…
☆388Updated 5 months ago
meta-pytorch / torchcomms
torchcomms: a modern PyTorch communications API
☆278Updated this week