Open Source Continuous Inference Benchmarking Qwen3.5, DeepSeek, GPTOSS - GB200 NVL72 vs MI355X vs B200 vs GB300 NVL72 vs H100 & soon™ TPUv6e/v7/Trainium2/3
☆797Apr 15, 2026Updated this week
Alternatives and similar repositories for InferenceX
Users that are interested in InferenceX are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Automating analysis from trace files☆66Updated this week
- Offline optimization of your disaggregated Dynamo graph☆255Updated this week
- Kubernetes CSI Driver for serving OCI model artifacts☆25Mar 23, 2026Updated 3 weeks ago
- The Intelligent Inference Scheduler for Large-scale Inference Services.☆66Feb 12, 2026Updated 2 months ago
- AIPerf is a comprehensive benchmarking tool that measures the performance of generative AI models served by your preferred inference solu…☆215Apr 8, 2026Updated last week
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆12Apr 4, 2022Updated 4 years ago
- See the Wiki page below for details about the SR/IOV patch set☆20Sep 28, 2021Updated 4 years ago
- ☆32Apr 19, 2025Updated 11 months ago
- ☆16Jul 8, 2024Updated last year
- Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serv…☆291Apr 2, 2026Updated 2 weeks ago
- NVIDIA Inference Xfer Library (NIXL)☆970Updated this week
- Open Model Engine (OME) — Kubernetes operator for LLM serving, GPU scheduling, and model lifecycle management. Works with SGLang, vLLM, T…☆422Updated this week
- The tool facilitates debugging convergence issues and testing new algorithms and recipes for training LLMs using Nvidia libraries such as…☆19Sep 17, 2025Updated 6 months ago
- A Quirky Assortment of CuTe Kernels☆924Updated this week
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Auto-tuning for vllm. Getting the best performance out of your LLM deployment (vllm+guidellm+optuna)☆51Mar 17, 2026Updated 3 weeks ago
- ☆47Updated this week
- Repository for AI model benchmarking on TT-Buda☆16Feb 9, 2026Updated 2 months ago
- Achieve state of the art inference performance with modern accelerators on Kubernetes☆2,957Updated this week
- ☆12Jul 24, 2024Updated last year
- A high-throughput and memory-efficient inference and serving engine for LLMs☆13Oct 10, 2025Updated 6 months ago
- Scoreboard for ONNX Backend Compatibility☆29Updated this week
- A Datacenter Scale Distributed Inference Serving Framework☆6,527Updated this week
- Like `kubectl get all`, but get really all resources☆30Apr 8, 2026Updated last week
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- GPU prices aggregator for cloud providers☆49Apr 7, 2026Updated last week
- ☆16Nov 24, 2025Updated 4 months ago
- A distributed in-memory store for temporal knowledge graphs☆10Mar 20, 2024Updated 2 years ago
- ☆85Feb 12, 2026Updated 2 months ago
- LLM training parallelisms (DP, FSDP, TP, PP) in pure C☆28Jan 27, 2026Updated 2 months ago
- FlyDSL is the Python front‑end of the project: Flexible LaYout DSL.☆148Updated this week
- vLLM Daily Summarization of Merged PRs☆49Apr 8, 2026Updated last week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆266Dec 4, 2025Updated 4 months ago
- Portable NIC Architecture☆60Feb 15, 2024Updated 2 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- [HotStorage '24] Can ZNS SSDs be Better Storage Devices for Persistent Cache?☆12Jun 14, 2024Updated last year
- Building the Virtuous Cycle for AI-driven LLM Systems☆217Updated this week
- Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs☆987Updated this week
- A command line utility to manage the configuration of a system's high performance network interfaces for RoCE deployments☆36Jul 25, 2023Updated 2 years ago
- DeepXTrace is a lightweight tool for precisely diagnosing slow ranks in DeepEP-based environments.☆95Jan 16, 2026Updated 3 months ago
- LLMPerf is a library for validating and benchmarking LLMs☆1,103Dec 9, 2024Updated last year
- A tiny FP8 multiplication unit written in Verilog. TinyTapeout 2 submission.☆14Nov 23, 2022Updated 3 years ago