vllm-project/vllm-skills

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/vllm-project/vllm-skills)

vllm-project / vllm-skills

Agent skills for vLLM

☆89

Alternatives and similar repositories for vllm-skills

Users that are interested in vllm-skills are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

hsliuustc0106 / vllm-omni-skills
View on GitHub
a collection of skills for vllm-omni
☆82Updated this week
vllm-project / vllm-daily
View on GitHub
vLLM Daily Summarization of Merged PRs
☆51Updated this week
vllm-project / router
View on GitHub
A high-performance and light-weight router for vLLM large scale deployment
☆321Jul 13, 2026Updated last week
vllm-project / dllm-plugin
View on GitHub
vLLM plugin for block-based diffusion language model (dLLM) support
☆24May 25, 2026Updated last month
vllm-project / agentic-api
View on GitHub
Stateful API logic for agentic applications using vLLM
☆51Updated this week
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
Rising0321 / nano-vllm-omni
View on GitHub
A lightweight `vLLM-Omni`-style diffusion implementation built around `Wan2.2-TI2V-5B-Diffusers` inspired from nano-vllm
☆55May 25, 2026Updated last month
alibaba / tair-kvcache
View on GitHub
Alibaba Cloud's high-performance KVCache system for LLM inference, with components for global cache management, inference simulation(HiSi…
☆215Updated this week
vllm-project / vllm-xpu-kernels
View on GitHub
The vLLM XPU kernels for Intel GPU
☆55Updated this week
vllm-project / vime
View on GitHub
An LLM post-training framework with vLLM for RL Scaling
☆379Updated this week
llm-d / llm-d-routing-sidecar
View on GitHub
Incubating P/D sidecar for llm-d
☆17Nov 13, 2025Updated 8 months ago
verl-project / verl-vla
View on GitHub
A unified VLA post-training framework for human-in-the-loop data collection, fine-tuning, and reinforcement learning.
☆39Updated this week
gofreelee / SpaceServe
View on GitHub
☆31Jul 13, 2026Updated last week
KuangjuX / AttnLink
View on GitHub
An experimental communicating attention kernel based on DeepEP.
☆34Jul 29, 2025Updated 11 months ago
tile-ai / TileRT
View on GitHub
Tile-Based Runtime for Ultra-Low-Latency LLM Inference
☆1,579Jul 14, 2026Updated last week
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
vllm-project / vllm-omni
View on GitHub
A framework for efficient model inference with omni-modality models
☆5,643Updated this week
novitalabs / pegaflow
View on GitHub
High-performance KV cache storage for LLM inference — GPU offloading, SSD caching, and cross-node sharing via RDMA. Works with vLLM and S…
☆180Updated this week
kaori-seasons / data-skill-hub
View on GitHub
面向数据领域专业人士的skills技能库，个人平常使用
☆15Jun 17, 2026Updated last month
LMCache / LMBenchmark
View on GitHub
Systematic and comprehensive benchmarks for LLM systems.
☆62Jan 28, 2026Updated 5 months ago
wassemgtk / MegaScale-Infer-Prototyp
View on GitHub
Prototyp MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism
☆31Apr 4, 2025Updated last year
InternLM / Kernel-Smith
View on GitHub
☆26Mar 31, 2026Updated 3 months ago
bytedance / InfiniStore
View on GitHub
KV cache store for distributed LLM inference
☆425Nov 13, 2025Updated 8 months ago
Tencent / hpc-ops
View on GitHub
High Performance LLM Inference Operator Library
☆1,052Updated this week
vllm-project / guidellm
View on GitHub
Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs
☆1,415Updated this week
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
BBuf / AI-Infra-Auto-Driven-SKILLS
View on GitHub
☆692Jul 14, 2026Updated last week
NoakLiu / PiKV
View on GitHub
PiKV: KV Cache Management System for Mixture of Experts [Efficient ML System]
☆61Jun 12, 2026Updated last month
vllm-project / vllm-bench
View on GitHub
High-performance Rust benchmark client for vLLM serving endpoints.
☆48Jul 9, 2026Updated last week
ai-dynamo / aiconfigurator
View on GitHub
Offline optimization of your disaggregated Dynamo graph
☆369Updated this week
vllm-project / recipes
View on GitHub
Common recipes to run vLLM
☆923Updated this week
llm-d / llm-d-inference-sim
View on GitHub
A lightweight, configurable, and real-time simulator designed to mimic the behavior of vLLM without the need for GPUs or running actual h…
☆167Updated this week
meta-pytorch / KernelAgent
View on GitHub
Autonomous GPU Kernel Generation & Optimization via Deep Agents
☆488Updated this week
sgl-project / sglang-omni
View on GitHub
SGLang Omni: High-Performance Multi-Stage Pipeline Framework for Omni Models
☆656Updated this week
Project-HAMi / volcano-vgpu-device-plugin
View on GitHub
Device-plugin for volcano vgpu which support hard resource isolation
☆161Jun 9, 2026Updated last month
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
cornserve-ai / cornserve
View on GitHub
Easy, Fast, and Scalable Multimodal AI
☆128Jun 2, 2026Updated last month
kvcache-ai / Mooncake
View on GitHub
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
☆5,941Updated this week
NVIDIA / srt-slurm
View on GitHub
NVIDIA Inference Benchmarks provide recipes in ready-to-use templates for evaluating platform speed. Validate your platform across speci…
☆40Updated this week
qhfan / UniPrefill
View on GitHub
Implementation of "UniPrefill: Universal Long-Context Prefill Acceleration via Block-wise Dynamic Sparsification"
☆41May 8, 2026Updated 2 months ago
sgl-project / sgl-kernel-npu
View on GitHub
SGLang kernel library for NPU
☆166Updated this week
shengshu-ai / TurboServe
View on GitHub
TurboServe: Serving Streaming Video Generation Efficiently and Economically
☆34Jul 12, 2026Updated last week
vbdi / epdserve
View on GitHub
[ICML 2025] Efficiently Serving Large Multimodal Models Using EPD Disaggregation
☆24Jul 11, 2026Updated last week