AI-Hypercomputer / inference-benchmarkLinks
☆17Updated 4 months ago
Alternatives and similar repositories for inference-benchmark
Users that are interested in inference-benchmark are comparing it to the libraries listed below
Sorting:
- GenAI inference performance benchmarking tool☆107Updated this week
 - WG Serving☆30Updated 2 weeks ago
 - knavigator is a development, testing, and optimization toolkit for AI/ML scheduling systems at scale on Kubernetes.☆70Updated 3 months ago
 - The main purpose of runtime copilot is to assist with node runtime management tasks such as configuring registries, upgrading versions, i…☆12Updated 2 years ago
 - agent-sandbox enables easy management of isolated, stateful, singleton workloads, ideal for use cases like AI agent runtimes.☆126Updated this week
 - Distributed KV cache coordinator☆80Updated this week
 - An Operator for deployment and maintenance of NVIDIA NIMs and NeMo microservices in a Kubernetes environment.☆130Updated last week
 - Incubating P/D sidecar for llm-d☆16Updated this week
 - llm-d helm charts and deployment examples☆45Updated last month
 - GPU analyzer for Kubernetes GPU clusters☆17Updated 5 years ago
 - d.run website☆15Updated this week
 - Helm charts for llm-d☆50Updated 3 months ago
 - ☆38Updated this week
 - A toolkit for discovering cluster network topology.☆74Updated this week
 - Inference scheduler for llm-d☆99Updated last week
 - More Flexible Device Extension Capability in Kubernetes (DevicePlugins++)☆23Updated 2 years ago
 - A set of system-oriented validators for kubeadm preflight checks.☆37Updated last week
 - Holistic job manager on Kubernetes☆116Updated last year
 - Example DRA driver that developers can fork and modify to get them started writing their own.☆96Updated this week
 - 🧯 Kubernetes coverage for fault awareness and recovery, works for any LLMOps, MLOps, AI workloads.☆33Updated 2 weeks ago
 - Cloud Native Artifacial Intelligence Model Format Specification☆109Updated this week
 - OME is a Kubernetes operator for enterprise-grade management and serving of Large Language Models (LLMs)☆301Updated this week
 - A workload for deploying LLM inference services on Kubernetes☆93Updated this week
 - A simulator of Kuberntes for batch and service workload.☆49Updated 4 years ago
 - Cloud Native Benchmarking of Foundation Models☆44Updated 3 months ago
 - ☆178Updated 2 weeks ago
 - Batch-scheduler based on K8s scheduling framework, related features have contributed to scheduler-plugins(Deprecated).☆25Updated 5 years ago
 - ☆40Updated last month
 - hub / spoke registration controllers☆42Updated last year
 - 💫 A lightweight p2p-based cache system for model distributions on Kubernetes. Reframing now to make it an unified cache system with POSI…☆24Updated 10 months ago