ai-dynamo/aiperf

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ai-dynamo/aiperf)

ai-dynamo / aiperf

AIPerf is a comprehensive benchmarking tool that measures the performance of generative AI models served by your preferred inference solution.

☆475

Alternatives and similar repositories for aiperf

Users that are interested in aiperf are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ai-dynamo / modelexpress
View on GitHub
Model Express is a Rust-based component meant to be placed next to existing model inference systems to speed up their startup times and i…
☆99Updated this week
ishandhanani / srt-slurm
View on GitHub
Benchmark SGLang on SLURM
☆24Apr 20, 2026Updated 3 months ago
ai-dynamo / aiconfigurator
View on GitHub
Offline optimization of your disaggregated Dynamo graph
☆379Updated this week
ai-dynamo / nixl
View on GitHub
NVIDIA Inference Xfer Library (NIXL)
☆1,157Updated this week
ai-dynamo / dynamo
View on GitHub
A Datacenter Scale Distributed Inference Serving Framework
☆7,608Updated this week
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
triton-inference-server / perf_analyzer
View on GitHub
☆151Jul 23, 2026Updated last week
SemiAnalysisAI / InferenceX
View on GitHub
Open Source Continuous Inference Benchmark Research Platform — Kimi K3 2.8T, MiniMax M3, DeepSeekv4, GLM5 - GB200 NVL72 vs MI355X vs B200…
☆1,294Updated this week
NVIDIA / srt-slurm
View on GitHub
NVIDIA Inference Benchmarks provide recipes in ready-to-use templates for evaluating platform speed. Validate your platform across speci…
☆40Updated this week
ai-dynamo / grove
View on GitHub
Kubernetes enhancements for Network Topology Aware Gang Scheduling & Autoscaling
☆247Updated this week
vllm-project / guidellm
View on GitHub
Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs
☆1,442Updated this week
kvcache-ai / Mooncake
View on GitHub
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
☆6,083Updated this week
vllm-project / production-stack
View on GitHub
vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization
☆2,477Updated this week
lightseekorg / tokenspeed
View on GitHub
TokenSpeed is a speed-of-light LLM inference engine.
☆1,751Updated this week
llm-d / llm-d
View on GitHub
Achieve state of the art inference performance with modern accelerators on Kubernetes
☆3,908Updated this week
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
lightseekorg / TorchSpec
View on GitHub
A PyTorch native library for training speculative decoding models
☆212Updated this week
LMCache / lmcache-agent-trace
View on GitHub
Agent application/benchmark/workload traces should be placed here.
☆15Apr 13, 2026Updated 3 months ago
fw-ai / benchmark
View on GitHub
Benchmark suite for LLMs from Fireworks.ai
☆111Updated this week
flashinfer-ai / flashinfer
View on GitHub
FlashInfer: Kernel Library for LLM Serving
☆6,062Updated this week
callanjfox / kv-cache-tester
View on GitHub
☆41Jul 12, 2026Updated 2 weeks ago
vllm-project / router
View on GitHub
A high-performance and light-weight router for vLLM large scale deployment
☆333Updated this week
lightseekorg / smg
View on GitHub
Engine-agnostic LLM gateway in Rust. Full OpenAI & Anthropic API compatibility across vLLM, TRT-LLM, TokenSpeed, SGLang, OpenAI, Gemini &…
☆423Updated this week
NVIDIA / Model-Optimizer
View on GitHub
A unified library of SOTA model optimization techniques like quantization, distillation, pruning, neural architecture search, speculative…
☆3,341Updated this week
LMCache / LMCache
View on GitHub
LMCache: Supercharge Your LLM with the Fastest KV Cache Layer
☆10,938Updated this week
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
NVIDIA / NVSentinel
View on GitHub
NVSentinel is a cross-platform fault remediation service designed to rapidly remediate runtime node-level issues in GPU-accelerated compu…
☆354Updated this week
kubernetes-sigs / inference-perf
View on GitHub
GenAI inference performance benchmarking tool
☆215Updated this week
vllm-project / llm-compressor
View on GitHub
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
☆3,602Updated this week
ome-projects / ome
View on GitHub
Open Model Engine (OME) — Kubernetes operator for LLM serving, GPU scheduling, and model lifecycle management. Works with SGLang, vLLM, T…
☆481Updated this week
NVIDIA / nvbandwidth
View on GitHub
A tool for bandwidth measurements on NVIDIA GPUs.
☆740Updated this week
kai-scheduler / KAI-Scheduler
View on GitHub
KAI Scheduler is an open source Kubernetes Native scheduler for AI workloads at large scale
☆1,426Updated this week
alibaba / tair-kvcache
View on GitHub
Alibaba Cloud's high-performance KVCache system for LLM inference, with components for global cache management, inference simulation(HiSi…
☆219Updated this week
NVIDIA / k8s-nim-operator
View on GitHub
An Operator for deployment and maintenance of NVIDIA NIMs and NeMo microservices in a Kubernetes environment.
☆159Updated this week
sgl-project / genai-bench
View on GitHub
Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serv…
☆316Jul 22, 2026Updated last week
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
sgl-project / sglang
View on GitHub
SGLang is a high-performance serving framework for large language models and multimodal models.
☆30,926Updated this week
taco-project / FlexKV
View on GitHub
☆310Updated this week
sgl-project / rbg
View on GitHub
A workload for deploying LLM inference services on Kubernetes
☆266Updated this week
NVIDIA / DCGM
View on GitHub
NVIDIA Data Center GPU Manager (DCGM) is a project for gathering telemetry and measuring the health of NVIDIA GPUs
☆773Jul 6, 2026Updated 3 weeks ago
NVIDIA / topograph
View on GitHub
A toolkit for discovering cluster network topology.
☆149Updated this week
meta-pytorch / MSLK
View on GitHub
MSLK (Meta Superintelligence Labs Kernels) is a collection of PyTorch GPU operator libraries that are designed and optimized for GenAI tr…
☆121Updated this week
run-ai / runai-model-streamer
View on GitHub
☆331Updated this week