Open Source Continuous Inference Benchmarking Qwen3.5, DeepSeek, GPTOSS - GB200 NVL72 vs MI355X vs B200 vs GB300 NVL72 vs H100 & soon™ TPUv6e/v7/Trainium2/3
☆717Mar 25, 2026Updated this week
Alternatives and similar repositories for InferenceX
Users that are interested in InferenceX are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Automating analysis from trace files☆64Updated this week
- Offline optimization of your disaggregated Dynamo graph☆227Updated this week
- Kubernetes CSI Driver for serving OCI model artifacts☆24Updated this week
- The Intelligent Inference Scheduler for Large-scale Inference Services.☆65Feb 12, 2026Updated last month
- Code for "What really matters in matrix-whitening optimizers?"☆23Oct 31, 2025Updated 4 months ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- ☆12Apr 4, 2022Updated 3 years ago
- ☆32Apr 19, 2025Updated 11 months ago
- ☆16Jul 8, 2024Updated last year
- ☆10Mar 2, 2024Updated 2 years ago
- ☆19Nov 10, 2023Updated 2 years ago
- A lightweight, general-purpose framework for evaluating GPU kernel correctness and performance.☆49Updated this week
- A Rust reimplementation of genai-bench for benchmarking LLM serving systems at high concurrency with accurate timing and industry-standar…☆284Updated this week
- AIPerf is a comprehensive benchmarking tool that measures the performance of generative AI models served by your preferred inference solu…☆182Mar 20, 2026Updated last week
- Open Model Engine (OME) — Kubernetes operator for LLM serving, GPU scheduling, and model lifecycle management. Works with SGLang, vLLM, T…☆404Updated this week
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- The tool facilitates debugging convergence issues and testing new algorithms and recipes for training LLMs using Nvidia libraries such as…☆19Sep 17, 2025Updated 6 months ago
- Auto-tuning for vllm. Getting the best performance out of your LLM deployment (vllm+guidellm+optuna)☆50Mar 17, 2026Updated last week
- ☆46Mar 20, 2026Updated last week
- Repository for AI model benchmarking on TT-Buda☆16Feb 9, 2026Updated last month
- ☆12Jul 24, 2024Updated last year
- A high-throughput and memory-efficient inference and serving engine for LLMs☆13Oct 10, 2025Updated 5 months ago
- See the Wiki page below for details about the SR/IOV patch set☆20Sep 28, 2021Updated 4 years ago
- NVIDIA Inference Xfer Library (NIXL)☆945Mar 20, 2026Updated last week
- ☆16Sep 24, 2024Updated last year
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- ☆128Updated this week
- Achieve state of the art inference performance with modern accelerators on Kubernetes☆2,657Updated this week
- Like `kubectl get all`, but get really all resources☆29Mar 20, 2026Updated last week
- ☆16Nov 24, 2025Updated 4 months ago
- ☆82Feb 12, 2026Updated last month
- LLM training parallelisms (DP, FSDP, TP, PP) in pure C☆26Jan 27, 2026Updated last month
- A signal processing library in Rust, with the goal of being a decent alternative to Matlab's Signal Processing Toolbox and scipy.signal☆19Jan 31, 2026Updated last month
- vLLM Daily Summarization of Merged PRs☆48Updated this week
- A Datacenter Scale Distributed Inference Serving Framework☆6,347Mar 20, 2026Updated last week
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- A high-throughput and memory-efficient inference and serving engine for LLMs☆267Dec 4, 2025Updated 3 months ago
- [HotStorage '24] Can ZNS SSDs be Better Storage Devices for Persistent Cache?☆12Jun 14, 2024Updated last year
- Building the Virtuous Cycle for AI-driven LLM Systems☆204Updated this week
- FPGA Labs for EECS 151/251A (Fall 2021)☆11Oct 20, 2021Updated 4 years ago
- Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs☆935Updated this week
- Notepad++ Jade syntax highlighter☆15Jun 25, 2016Updated 9 years ago
- A tiny FP8 multiplication unit written in Verilog. TinyTapeout 2 submission.☆14Nov 23, 2022Updated 3 years ago