AIPerf is a comprehensive benchmarking tool that measures the performance of generative AI models served by your preferred inference solution.
☆182Mar 20, 2026Updated last week
Alternatives and similar repositories for aiperf
Users that are interested in aiperf are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Model Express is a Rust-based component meant to be placed next to existing model inference systems to speed up their startup times and i…☆40Mar 20, 2026Updated last week
- Open-source library for scalable, reproducible evaluation of AI models and benchmarks.☆240Mar 20, 2026Updated last week
- ☆138Mar 13, 2026Updated 2 weeks ago
- This project demonstrates a decoupled real-time agent architecture that connects LangGraph agents to remote tools served by custom MCP (M…☆24Jul 21, 2025Updated 8 months ago
- Distributed KV cache scheduling & offloading libraries☆117Mar 20, 2026Updated last week
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Example of applying CUDA graphs to LLaMA-v2☆12Aug 25, 2023Updated 2 years ago
- Kubernetes CSI Driver for serving OCI model artifacts☆24Updated this week
- 青空文庫のテキストファイル☆14Feb 4, 2024Updated 2 years ago
- llm-d helm charts and deployment examples☆50Updated this week
- A Kubernetes Operator to manage Node OS customizations.☆48Mar 19, 2026Updated last week
- A workload for deploying LLM inference services on Kubernetes☆192Updated this week
- The Intelligent Inference Scheduler for Large-scale Inference Services.☆65Feb 12, 2026Updated last month
- Tools for generating TPC-* datasets☆31Jun 23, 2024Updated last year
- Offline optimization of your disaggregated Dynamo graph☆227Updated this week
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Tooling for optimized, validated, and reproducible GPU-accelerated AI runtime in Kubernetes☆129Mar 20, 2026Updated last week
- Sparse-dense matrix-matrix multiplication on GPUs☆14Oct 15, 2018Updated 7 years ago
- ☆18Mar 11, 2026Updated 2 weeks ago
- A Datacenter Scale Distributed Inference Serving Framework☆6,347Mar 20, 2026Updated last week
- NVIDIA Inference Xfer Library (NIXL)☆945Mar 20, 2026Updated last week
- A Rust reimplementation of genai-bench for benchmarking LLM serving systems at high concurrency with accurate timing and industry-standar…☆284Updated this week
- ☆16Mar 20, 2026Updated last week
- Intel Gaudi's Megatron DeepSpeed Large Language Models for training☆18Dec 19, 2024Updated last year
- An optimized Merkle Patricia Trie implementation on GPU, fully compatible with and integrable into Ethereum. The paper is published on VL…☆14Apr 15, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- ☆17Mar 4, 2026Updated 3 weeks ago
- A branch of marss with DRAMSim hooks☆18Aug 22, 2013Updated 12 years ago
- ☆11Apr 11, 2019Updated 6 years ago
- A resume template written in typst, designed for zh_CN.☆13Mar 3, 2025Updated last year
- Open Source Continuous Inference Benchmarking Qwen3.5, DeepSeek, GPTOSS - GB200 NVL72 vs MI355X vs B200 vs GB300 NVL72 vs H100 & soon™ TP…☆717Updated this week
- ☆94May 31, 2025Updated 9 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆16Mar 20, 2026Updated last week
- A shell script for creating a new emqx node for an existing one☆12Sep 14, 2022Updated 3 years ago
- ☆12Oct 1, 2024Updated last year
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- An Envoy inspired, ultimate LLM-first gateway for LLM serving and downstream application developers and enterprises☆26Apr 24, 2025Updated 11 months ago
- Evaluate how vLLM and SGLang perform when running a small LLM model on a mid-range NVIDIA GPU☆20Mar 15, 2026Updated last week
- helm charts for deploying models with llm-d☆29Mar 17, 2026Updated last week
- Distributed ML Optimizer☆35Jul 28, 2021Updated 4 years ago
- Tenstorrent Firmware repository☆24Feb 25, 2026Updated last month
- Notebooks for Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations☆15Oct 3, 2019Updated 6 years ago
- Test Orchestrator for Performance and Scalability of AI pLatforms☆16Mar 20, 2026Updated last week