kubernetes-sigs/inference-perf

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/kubernetes-sigs/inference-perf)

kubernetes-sigs / inference-perf

GenAI inference performance benchmarking tool

☆212

Alternatives and similar repositories for inference-perf

Users that are interested in inference-perf are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

llm-d / llm-d-benchmark
View on GitHub
llm-d benchmark scripts and tooling
☆62Updated this week
kubernetes-sigs / gateway-api-inference-extension
View on GitHub
Gateway API Inference Extension
☆723Updated this week
kubernetes-sigs / wg-serving
View on GitHub
WG Serving
☆38Mar 24, 2026Updated 4 months ago
kubernetes-sigs / lws
View on GitHub
LeaderWorkerSet: An API for deploying a group of pods as a unit of replication
☆769Updated this week
llm-d / llm-d-workload-variant-autoscaler
View on GitHub
Variant optimization autoscaler for distributed inference workloads
☆52Updated this week
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
llm-d-incubation / llm-d-infra
View on GitHub
llm-d helm charts and deployment examples
☆59May 1, 2026Updated 2 months ago
llm-d / llm-d-router
View on GitHub
llm-d Router: The intelligent entry point for inference requests
☆269Updated this week
llm-d / llm-d
View on GitHub
Achieve state of the art inference performance with modern accelerators on Kubernetes
☆3,875Updated this week
DaoCloud / ckube
View on GitHub
Kubernetes APIServer 高性能代理组件，代理 APIServer 的 List 请求，其它类型的请求会直接反向代理到原生 APIServer。 CKube 还额外支持了分页、搜索和索引等功能。并且，CKube 100% 兼容原生 kubectl 和 ku…
☆19Sep 16, 2022Updated 3 years ago
llm-d / llm-d-kv-cache
View on GitHub
Distributed KV cache scheduling & offloading libraries
☆164Updated this week
llm-d-incubation / llm-d-fast-model-actuation
View on GitHub
Kubernetes controllers for fast model actuation using vLLM sleep/wake and launcher-based model swapping
☆16Updated this week
AI-Hypercomputer / inference-benchmark
View on GitHub
☆22Mar 11, 2026Updated 4 months ago
knoway-dev / knoway
View on GitHub
An Envoy inspired, ultimate LLM-first gateway for LLM serving and downstream application developers and enterprises
☆27Apr 24, 2025Updated last year
ome-projects / ome
View on GitHub
Open Model Engine (OME) — Kubernetes operator for LLM serving, GPU scheduling, and model lifecycle management. Works with SGLang, vLLM, T…
☆482Updated this week
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
kubernetes-sigs / dra-example-driver
View on GitHub
Example DRA driver that developers can fork and modify to get them started writing their own.
☆136Updated this week
llm-d / llm-d-routing-sidecar
View on GitHub
Incubating P/D sidecar for llm-d
☆17Nov 13, 2025Updated 8 months ago
ai-dynamo / grove
View on GitHub
Kubernetes enhancements for Network Topology Aware Gang Scheduling & Autoscaling
☆244Updated this week
kubernetes-sigs / dra-driver-nvidia-gpu
View on GitHub
DRA Driver for NVIDIA GPUs
☆677Updated this week
kai-scheduler / KAI-Scheduler
View on GitHub
KAI Scheduler is an open source Kubernetes Native scheduler for AI workloads at large scale
☆1,409Updated this week
kubernetes-sigs / dra-driver-cpu
View on GitHub
CPU DRA Driver
☆59Updated this week
kerthcet / github-workflow-as-kube
View on GitHub
Following the same workflows as Kubernetes. Widely used in InftyAI community.
☆13May 31, 2026Updated last month
llm-d / llm-d-inference-sim
View on GitHub
A lightweight, configurable, and real-time simulator designed to mimic the behavior of vLLM without the need for GPUs or running actual h…
☆170Updated this week
kubernetes-sigs / dranet
View on GitHub
DRANET is a Kubernetes Network Driver that uses Dynamic Resource Allocation (DRA) to deliver high-performance networking for demanding ap…
☆130Updated this week
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
sgl-project / rbg
View on GitHub
A workload for deploying LLM inference services on Kubernetes
☆263Updated this week
kubernetes-sigs / kueue
View on GitHub
Kubernetes-native Job Queueing
☆2,747Updated this week
BaizeAI / dataset
View on GitHub
Simplified Data Management and Sharing for Kubernetes
☆18Updated this week
Azure / aks-rdma-infiniband
View on GitHub
⚡ Guidance, samples, and tools for HPC workloads on AKS clusters with RDMA and InfiniBand support, including GPUDirect RDMA.
☆23Updated this week
google / dranet
View on GitHub
DRANET is a Kubernetes Network Driver that uses Dynamic Resource Allocation (DRA) to deliver high-performance networking for demanding ap…
☆160Dec 9, 2025Updated 7 months ago
vllm-project / guidellm
View on GitHub
Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs
☆1,429Updated this week
InftyAI / llmaz
View on GitHub
☸️ Easy, advanced inference platform for large language models on Kubernetes. 🌟 Star to support our work!
☆309Jan 26, 2026Updated 6 months ago
copilot-io / runtime-copilot
View on GitHub
The main purpose of runtime copilot is to assist with node runtime management tasks such as configuring registries, upgrading versions, i…
☆13May 16, 2023Updated 3 years ago
ai-dynamo / modelexpress
View on GitHub
Model Express is a Rust-based component meant to be placed next to existing model inference systems to speed up their startup times and i…
☆91Updated this week
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
chaunceyjiang / fake-gpu
View on GitHub
This project is designed to simulate GPU information, making it easier to test scenarios where a GPU is not available.
☆65Jan 9, 2026Updated 6 months ago
ai-dynamo / nixl
View on GitHub
NVIDIA Inference Xfer Library (NIXL)
☆1,151Updated this week
llm-d / llm-d-model-service
View on GitHub
Simplified model deployment on llm-d
☆29Jul 2, 2025Updated last year
fmperf-project / fmperf
View on GitHub
Cloud Native Benchmarking of Foundation Models
☆46Jul 31, 2025Updated 11 months ago
modelpack / modctl
View on GitHub
Command-line tools for managing OCI model artifacts, which are bundled based on Model Spec
☆78Updated this week
vllm-project / production-stack
View on GitHub
vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization
☆2,474Updated this week
modelpack / model-spec
View on GitHub
An Open Standard for Packaging, Distributing and Running LLMs in Cloud-Native Environments
☆218Updated this week