ai-dynamo/grove

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ai-dynamo/grove)

ai-dynamo / grove

Kubernetes enhancements for Network Topology Aware Gang Scheduling & Autoscaling

☆238

Alternatives and similar repositories for grove

Users that are interested in grove are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

NVIDIA / k8s-nim-operator
View on GitHub
An Operator for deployment and maintenance of NVIDIA NIMs and NeMo microservices in a Kubernetes environment.
☆159Updated this week
kai-scheduler / KAI-Scheduler
View on GitHub
KAI Scheduler is an open source Kubernetes Native scheduler for AI workloads at large scale
☆1,384Updated this week
run-ai / runai-model-streamer
View on GitHub
☆324Updated this week
Tandemn-Labs / tandemn-system
View on GitHub
Tandemn's server is the core orchestration engine that deploys, schedules, and optimizes large-scale AI inference workloads across hetero…
☆22Updated this week
run-ai / fake-gpu-operator
View on GitHub
☆294Jul 5, 2026Updated last week
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
ai-dynamo / modelexpress
View on GitHub
Model Express is a Rust-based component meant to be placed next to existing model inference systems to speed up their startup times and i…
☆81Updated this week
NVIDIA / knavigator
View on GitHub
knavigator is a development, testing, and optimization toolkit for AI/ML scheduling systems at scale on Kubernetes.
☆79Jul 6, 2026Updated last week
NVIDIA / topograph
View on GitHub
A toolkit for discovering cluster network topology.
☆140Updated this week
kubernetes-sigs / lws
View on GitHub
LeaderWorkerSet: An API for deploying a group of pods as a unit of replication
☆759Updated this week
ai-dynamo / nixl
View on GitHub
NVIDIA Inference Xfer Library (NIXL)
☆1,133Updated this week
project-codeflare / multi-cluster-app-dispatcher
View on GitHub
Holistic job manager on Kubernetes
☆117Feb 20, 2024Updated 2 years ago
run-ai / kwok-operator
View on GitHub
☆20Jul 3, 2026Updated last week
kerthcet / github-workflow-as-kube
View on GitHub
Following the same workflows as Kubernetes. Widely used in InftyAI community.
☆13May 31, 2026Updated last month
NVIDIA / gpu-usage-monitor
View on GitHub
A comprehensive Helm chart for monitoring GPU resources in Kubernetes clusters. This tool provides real-time visibility into GPU allocati…
☆27Jun 30, 2026Updated 2 weeks ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
llm-d / llm-d-inference-sim
View on GitHub
A lightweight, configurable, and real-time simulator designed to mimic the behavior of vLLM without the need for GPUs or running actual h…
☆162Jul 8, 2026Updated last week
infinigence / FUSCO
View on GitHub
High-performance distributed data shuffling (all-to-all) library for MoE training and inference
☆123Mar 7, 2026Updated 4 months ago
kubernetes-sigs / wg-serving
View on GitHub
WG Serving
☆38Mar 24, 2026Updated 3 months ago
wassemgtk / MegaScale-Infer-Prototyp
View on GitHub
Prototyp MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism
☆31Apr 4, 2025Updated last year
kubernetes-sigs / gateway-api-inference-extension
View on GitHub
Gateway API Inference Extension
☆712Updated this week
volcano-sh / resource-exporter
View on GitHub
Resource Exporter for volcano scheduling, e.g. NUMA-Aware scheduling.
☆19Jul 8, 2026Updated last week
vllm-project / vllm-skills
View on GitHub
Agent skills for vLLM
☆88Apr 3, 2026Updated 3 months ago
kubernetes-sigs / dra-driver-cpu
View on GitHub
CPU DRA Driver
☆58Updated this week
llm-d / llm-d-kv-cache
View on GitHub
Distributed KV cache scheduling & offloading libraries
☆161Updated this week
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
llm-d / llm-d-model-service
View on GitHub
Simplified model deployment on llm-d
☆29Jul 2, 2025Updated last year
kubernetes-sigs / dra-driver-nvidia-gpu
View on GitHub
DRA Driver for NVIDIA GPUs
☆674Updated this week
vllm-project / router
View on GitHub
A high-performance and light-weight router for vLLM large scale deployment
☆309Updated this week
ai-dynamo / aiperf
View on GitHub
AIPerf is a comprehensive benchmarking tool that measures the performance of generative AI models served by your preferred inference solu…
☆441Updated this week
openshift / instaslice-operator
View on GitHub
InstaSlice Operator facilitates slicing of accelerators using stable APIs
☆51Updated this week
kubernetes / kms
View on GitHub
Kubernetes KMS implementation
☆27Updated this week
run-ai / jupyterlab_genv
View on GitHub
GPU Environment Management for JupyterLab
☆26Feb 19, 2024Updated 2 years ago
InftyAI / Manta
View on GitHub
💫 A lightweight p2p-based cache system for model distributions on Kubernetes. Reframing now to make it an unified cache system with POSI…
☆27Dec 6, 2024Updated last year
llm-d / llm-d
View on GitHub
Achieve state of the art inference performance with modern accelerators on Kubernetes
☆3,798Updated this week
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
openshift / cluster-resource-override-admission-operator
View on GitHub
Operator for the mutating admission webhook for ClusterResourceOverride
☆19Jul 3, 2026Updated last week
berkmancenter / adf
View on GitHub
Augmented Dickey-Fuller implementation in Go
☆12Mar 15, 2019Updated 7 years ago
gardener-attic / vpa-exporter
View on GitHub
[DEPRECATED] Prometheus exporter for VPA recommendations
☆12Aug 22, 2023Updated 2 years ago
NVIDIA / go-gpuallocator
View on GitHub
Go Abstraction for Allocating NVIDIA GPUs with Custom Policies
☆123Updated this week
SymbioticLab / tensorflow-salus
View on GitHub
tensorflow fork with Salus integration
☆12Jan 7, 2022Updated 4 years ago
LMCache / LMBenchmark
View on GitHub
Systematic and comprehensive benchmarks for LLM systems.
☆62Jan 28, 2026Updated 5 months ago
kubernetes-sigs / dra-example-driver
View on GitHub
Example DRA driver that developers can fork and modify to get them started writing their own.
☆135Updated this week