AIPerf is a comprehensive benchmarking tool that measures the performance of generative AI models served by your preferred inference solution.
☆320May 21, 2026Updated last week
Alternatives and similar repositories for aiperf
Users that are interested in aiperf are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Kubernetes CSI Driver for serving OCI model artifacts☆27Updated this week
- Model Express is a Rust-based component meant to be placed next to existing model inference systems to speed up their startup times and i…☆64May 22, 2026Updated last week
- ☆145May 8, 2026Updated 2 weeks ago
- ☆13Jun 18, 2024Updated last year
- High-performance KV cache storage for LLM inference — GPU offloading, SSD caching, and cross-node sharing via RDMA. Works with vLLM and S…☆106Updated this week
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- A simple example of running a MongoDB instance to query a database☆10Aug 31, 2022Updated 3 years ago
- Example of applying CUDA graphs to LLaMA-v2☆11Aug 25, 2023Updated 2 years ago
- Distributed KV cache scheduling & offloading libraries☆149May 22, 2026Updated last week
- Open Source Continuous Inference Benchmarking Qwen3.5, DeepSeek, GPTOSS - GB200 NVL72 vs MI355X vs B200 vs GB300 NVL72 vs H100 & soon™ TP…☆1,002Updated this week
- A Top-Down Profiler for GPU Applications☆22Feb 29, 2024Updated 2 years ago
- ☆105May 31, 2025Updated 11 months ago
- Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serv…☆300May 14, 2026Updated 2 weeks ago
- NVIDIA Inference Xfer Library (NIXL)☆1,041May 22, 2026Updated last week
- Repository for AI model benchmarking on TT-Buda☆16Feb 9, 2026Updated 3 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- A tool for bandwidth measurements on NVIDIA GPUs.☆700Apr 8, 2026Updated last month
- A Datacenter Scale Distributed Inference Serving Framework☆6,941May 22, 2026Updated last week
- Offline optimization of your disaggregated Dynamo graph☆307May 22, 2026Updated last week
- ☆25Jun 24, 2022Updated 3 years ago
- Kubernetes enhancements for Network Topology Aware Gang Scheduling & Autoscaling☆212Updated this week
- llm-d helm charts and deployment examples☆57May 1, 2026Updated 3 weeks ago
- This CLI tool and Python3 module collects the current system state for documentation☆25Apr 9, 2026Updated last month
- A workload for deploying LLM inference services on Kubernetes☆224Updated this week
- Open Model Engine (OME) — Kubernetes operator for LLM serving, GPU scheduling, and model lifecycle management. Works with SGLang, vLLM, T…☆456Updated this week
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- The Intelligent Inference Scheduler for Large-scale Inference Services.☆68Feb 12, 2026Updated 3 months ago
- Hypercorn is an ASGI and WSGI Server based on Hyper libraries and inspired by Gunicorn.☆18Jan 12, 2026Updated 4 months ago
- Sparse-dense matrix-matrix multiplication on GPUs☆14Oct 15, 2018Updated 7 years ago
- ☆20Mar 11, 2026Updated 2 months ago
- Tenstorrent Topology (TT-Topology) is a command line utility used to flash multiple NB cards on a system to use specific eth routing conf…☆16May 17, 2026Updated last week
- A high-performance and light-weight router for vLLM large scale deployment☆233May 6, 2026Updated 3 weeks ago
- Estimate MFU for DeepSeekV3☆26Jan 5, 2025Updated last year
- A high-throughput and memory-efficient inference and serving engine for LLMs☆29May 21, 2026Updated last week
- ☆16May 19, 2026Updated last week
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- ☆19Apr 21, 2026Updated last month
- Intel Gaudi's Megatron DeepSpeed Large Language Models for training☆18Dec 19, 2024Updated last year
- An optimized Merkle Patricia Trie implementation on GPU, fully compatible with and integrable into Ethereum. The paper is published on VL…☆14Apr 15, 2024Updated 2 years ago
- NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the …☆291May 21, 2026Updated last week
- ☆18May 6, 2026Updated 3 weeks ago
- More reliable Video Understanding Evaluation☆15Sep 23, 2025Updated 8 months ago
- Engine-agnostic LLM gateway in Rust. Full OpenAI & Anthropic API compatibility across SGLang, vLLM, TRT-LLM, OpenAI, Gemini & more. Indus…☆273May 21, 2026Updated last week