AIPerf is a comprehensive benchmarking tool that measures the performance of generative AI models served by your preferred inference solution.
☆215Apr 8, 2026Updated last week
Alternatives and similar repositories for aiperf
Users that are interested in aiperf are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- High-performance KV cache storage for LLM inference — GPU offloading, SSD caching, and cross-node sharing via RDMA. Works with vLLM and S…☆42Updated this week
- ☆141Apr 8, 2026Updated last week
- Example of applying CUDA graphs to LLaMA-v2☆11Aug 25, 2023Updated 2 years ago
- Distributed KV cache scheduling & offloading libraries☆126Updated this week
- Rust (embedded-hal) driver for the HZ Grow R502 capacitive fingerprint sensor☆18Sep 27, 2020Updated 5 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Repository for AI model benchmarking on TT-Buda☆16Feb 9, 2026Updated 2 months ago
- ☆25Jun 24, 2022Updated 3 years ago
- Open Source Continuous Inference Benchmarking Qwen3.5, DeepSeek, GPTOSS - GB200 NVL72 vs MI355X vs B200 vs GB300 NVL72 vs H100 & soon™ TP…☆797Updated this week
- NVIDIA Inference Xfer Library (NIXL)☆970Updated this week
- llm-d helm charts and deployment examples☆54Apr 2, 2026Updated 2 weeks ago
- Like `kubectl get all`, but get really all resources☆30Apr 8, 2026Updated last week
- A workload for deploying LLM inference services on Kubernetes☆203Updated this week
- Tools for generating TPC-* datasets☆31Jun 23, 2024Updated last year
- The Intelligent Inference Scheduler for Large-scale Inference Services.☆66Feb 12, 2026Updated 2 months ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- A Datacenter Scale Distributed Inference Serving Framework☆6,527Updated this week
- ☆20Mar 11, 2026Updated last month
- Offline optimization of your disaggregated Dynamo graph☆255Updated this week
- Tenstorrent Topology (TT-Topology) is a command line utility used to flash multiple NB cards on a system to use specific eth routing conf…☆16Feb 26, 2026Updated last month
- Tooling for optimized, validated, and reproducible GPU-accelerated AI runtime in Kubernetes☆261Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆27Apr 9, 2026Updated last week
- Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serv…☆291Apr 2, 2026Updated 2 weeks ago
- Open Model Engine (OME) — Kubernetes operator for LLM serving, GPU scheduling, and model lifecycle management. Works with SGLang, vLLM, T…☆422Updated this week
- ☆16Apr 9, 2026Updated last week
- Deploy open-source AI quickly and easily - Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- wentao.site / Hugo Template / A template repository for Hugo based blog☆55Mar 21, 2026Updated 3 weeks ago
- Intel Gaudi's Megatron DeepSpeed Large Language Models for training☆18Dec 19, 2024Updated last year
- An optimized Merkle Patricia Trie implementation on GPU, fully compatible with and integrable into Ethereum. The paper is published on VL…☆14Apr 15, 2024Updated 2 years ago
- ☆18Mar 25, 2026Updated 3 weeks ago
- ☆13Jul 10, 2024Updated last year
- ☆11Apr 11, 2019Updated 7 years ago
- A stateful serverless demo app running on AWS Lambda, using Apache Flink Stateful Functions☆15Oct 13, 2020Updated 5 years ago
- ☆98May 31, 2025Updated 10 months ago
- Implementation of the paper: "Audio Mamba: Bidirectional State Space Model for Audio Representation Learning" in pytorch☆14Updated this week
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Nebula: Deep Neural Network Benchmarks in C++☆13Jan 2, 2025Updated last year
- ☆14Jul 13, 2025Updated 9 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆17Apr 10, 2026Updated last week
- A shell script for creating a new emqx node for an existing one☆12Sep 14, 2022Updated 3 years ago
- ☆12Oct 1, 2024Updated last year
- A lightweight, configurable, and real-time simulator designed to mimic the behavior of vLLM without the need for GPUs or running actual h…☆113Apr 9, 2026Updated last week
- Frontend integration for PyTorch with tt-mlir☆23Mar 2, 2026Updated last month