AIPerf is a comprehensive benchmarking tool that measures the performance of generative AI models served by your preferred inference solution.
☆368Jun 13, 2026Updated this week
Alternatives and similar repositories for aiperf
Users that are interested in aiperf are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Kubernetes CSI Driver for serving OCI model artifacts☆27May 25, 2026Updated 3 weeks ago
- ☆146Jun 9, 2026Updated last week
- ☆13Jun 18, 2024Updated 2 years ago
- High-performance KV cache storage for LLM inference — GPU offloading, SSD caching, and cross-node sharing via RDMA. Works with vLLM and S…☆142Updated this week
- Example of applying CUDA graphs to LLaMA-v2☆11Aug 25, 2023Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- OCI container images. A Slinky project.☆22Jun 10, 2026Updated last week
- Distributed KV cache scheduling & offloading libraries☆156Updated this week
- A Top-Down Profiler for GPU Applications☆22Feb 29, 2024Updated 2 years ago
- Open Source Continuous Inference Benchmark Research Platform Kimi K2.7-Code, DeepSeekv4, GLM5 - GB200 NVL72 vs MI355X vs B200 vs GB300 N…☆1,114Updated this week
- ☆68Updated this week
- Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serv…☆303May 14, 2026Updated last month
- NVIDIA Inference Xfer Library (NIXL)☆1,079Updated this week
- Repository for AI model benchmarking on TT-Buda☆16Feb 9, 2026Updated 4 months ago
- A tool for bandwidth measurements on NVIDIA GPUs.☆712Apr 8, 2026Updated 2 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Security-native LLM system for AI-generated application security.☆263Jun 4, 2026Updated 2 weeks ago
- A Datacenter Scale Distributed Inference Serving Framework☆7,248Updated this week
- Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding☆109Dec 2, 2025Updated 6 months ago
- Offline optimization of your disaggregated Dynamo graph☆335Updated this week
- ☆25Jun 24, 2022Updated 3 years ago
- A Kubernetes Operator to manage Node OS customizations.☆57Updated this week
- Kubernetes enhancements for Network Topology Aware Gang Scheduling & Autoscaling☆224Updated this week
- llm-d helm charts and deployment examples☆58May 1, 2026Updated last month
- Tools for generating TPC-* datasets☆32Jun 23, 2024Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- This CLI tool and Python3 module collects the current system state for documentation☆26Apr 9, 2026Updated 2 months ago
- Like `kubectl get all`, but get really all resources☆33Updated this week
- Open Model Engine (OME) — Kubernetes operator for LLM serving, GPU scheduling, and model lifecycle management. Works with SGLang, vLLM, T…☆468Updated this week
- The Intelligent Inference Scheduler for Large-scale Inference Services.☆68Feb 12, 2026Updated 4 months ago
- Hypercorn is an ASGI and WSGI Server based on Hyper libraries and inspired by Gunicorn.☆19Jan 12, 2026Updated 5 months ago
- ☆21Mar 11, 2026Updated 3 months ago
- ☆107Sep 9, 2024Updated last year
- Tenstorrent Topology (TT-Topology) is a command line utility used to flash multiple NB cards on a system to use specific eth routing conf…☆16Jun 11, 2026Updated last week
- A high-performance and light-weight router for vLLM large scale deployment☆268May 6, 2026Updated last month
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Estimate MFU for DeepSeekV3☆26Jan 5, 2025Updated last year
- SGLang Omni: High-Performance Multi-Stage Pipeline Framework for Omni Models☆488Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆30Updated this week
- DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling☆27Updated this week
- Environments by the Prime Intellect Research Team☆60Updated this week
- ☆16Jun 3, 2026Updated 2 weeks ago
- ☆19Apr 21, 2026Updated last month