AIPerf is a comprehensive benchmarking tool that measures the performance of generative AI models served by your preferred inference solution.
☆253Apr 30, 2026Updated last week
Alternatives and similar repositories for aiperf
Users that are interested in aiperf are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Kubernetes CSI Driver for serving OCI model artifacts☆25Apr 29, 2026Updated last week
- Model Express is a Rust-based component meant to be placed next to existing model inference systems to speed up their startup times and i…☆56Updated this week
- High-performance KV cache storage for LLM inference — GPU offloading, SSD caching, and cross-node sharing via RDMA. Works with vLLM and S…☆51Apr 28, 2026Updated last week
- ☆143Apr 28, 2026Updated last week
- This project demonstrates a decoupled real-time agent architecture that connects LangGraph agents to remote tools served by custom MCP (M…☆26Jul 21, 2025Updated 9 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Example of applying CUDA graphs to LLaMA-v2☆11Aug 25, 2023Updated 2 years ago
- Open Source Continuous Inference Benchmarking Qwen3.5, DeepSeek, GPTOSS - GB200 NVL72 vs MI355X vs B200 vs GB300 NVL72 vs H100 & soon™ TP…☆924Updated this week
- Distributed KV cache scheduling & offloading libraries☆140Updated this week
- ☆99May 31, 2025Updated 11 months ago
- NVIDIA Inference Xfer Library (NIXL)☆1,011Apr 30, 2026Updated last week
- Repository for AI model benchmarking on TT-Buda☆16Feb 9, 2026Updated 2 months ago
- Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding☆100Dec 2, 2025Updated 5 months ago
- A Datacenter Scale Distributed Inference Serving Framework☆6,701Apr 30, 2026Updated last week
- ☆25Jun 24, 2022Updated 3 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- A Kubernetes Operator to manage Node OS customizations.☆52Apr 29, 2026Updated last week
- Tools for generating TPC-* datasets☆31Jun 23, 2024Updated last year
- The Intelligent Inference Scheduler for Large-scale Inference Services.☆66Feb 12, 2026Updated 2 months ago
- ☆20Mar 11, 2026Updated last month
- A Rust crate offering similar functionality to the Python transformers package using Candle.☆14Nov 19, 2024Updated last year
- ☆105Sep 9, 2024Updated last year
- A high-performance and light-weight router for vLLM large scale deployment☆214Apr 30, 2026Updated last week
- Tenstorrent Topology (TT-Topology) is a command line utility used to flash multiple NB cards on a system to use specific eth routing conf…☆16Feb 26, 2026Updated 2 months ago
- Offline optimization of your disaggregated Dynamo graph☆280Updated this week
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- A tool for bandwidth measurements on NVIDIA GPUs.☆689Apr 8, 2026Updated 3 weeks ago
- Kubernetes enhancements for Network Topology Aware Gang Scheduling & Autoscaling☆201Updated this week
- 🧠 ResNet: Deep Residual Learning for Image Recognition☆10Sep 18, 2021Updated 4 years ago
- Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serv…☆294Apr 23, 2026Updated last week
- Tooling for optimized, validated, and reproducible GPU-accelerated AI runtime in Kubernetes☆279Apr 30, 2026Updated last week
- Open Model Engine (OME) — Kubernetes operator for LLM serving, GPU scheduling, and model lifecycle management. Works with SGLang, vLLM, T…☆440Updated this week
- ☆16Apr 29, 2026Updated last week
- Engine-agnostic LLM gateway in Rust. Full OpenAI & Anthropic API compatibility across SGLang, vLLM, TRT-LLM, OpenAI, Gemini & more. Indus…☆212Updated this week
- ☆18Apr 22, 2026Updated 2 weeks ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- 🚀 Get started in our repos☆12Apr 25, 2026Updated last week
- Music and Artificial Intelligence☆23Feb 17, 2019Updated 7 years ago
- ☆11Apr 11, 2019Updated 7 years ago
- A resume template written in typst, designed for zh_CN.☆13Mar 3, 2025Updated last year
- A stateful serverless demo app running on AWS Lambda, using Apache Flink Stateful Functions☆15Oct 13, 2020Updated 5 years ago
- 🦠 COVID-19 Daily Data from Worldometers with Python☆13Feb 28, 2021Updated 5 years ago
- Nebula: Deep Neural Network Benchmarks in C++☆13Jan 2, 2025Updated last year