NVIDIA / topographLinks

A toolkit for discovering cluster network topology.

☆72

Alternatives and similar repositories for topograph

Users that are interested in topograph are comparing it to the libraries listed below

Sorting:

llm-d / llm-d-inference-scheduler
Inference scheduler for llm-d
☆99Updated last week
NVIDIA / k8s-nim-operator
An Operator for deployment and maintenance of NVIDIA NIMs and NeMo microservices in a Kubernetes environment.
☆131Updated last week
kubernetes-sigs / jobset
JobSet: a k8s native API for distributed ML training and HPC workloads
☆266Updated last week
kubernetes-sigs / inference-perf
GenAI inference performance benchmarking tool
☆105Updated this week
NVIDIA / knavigator
knavigator is a development, testing, and optimization toolkit for AI/ML scheduling systems at scale on Kubernetes.
☆70Updated 3 months ago
NVIDIA / k8s-dra-driver-gpu
NVIDIA DRA Driver for GPUs
☆458Updated this week
NVIDIA / grove
Kubernetes enhancements for Network Topology Aware Gang Scheduling & Autoscaling
☆71Updated last week
project-codeflare / multi-cluster-app-dispatcher
Holistic job manager on Kubernetes
☆116Updated last year
NVIDIA / KAI-Scheduler
KAI Scheduler is an open source Kubernetes Native scheduler for AI workloads at large scale
☆857Updated this week
kubernetes-sigs / gateway-api-inference-extension
Gateway API Inference Extension
☆495Updated this week
NVIDIA / go-gpuallocator
Go Abstraction for Allocating NVIDIA GPUs with Custom Policies
☆116Updated 3 weeks ago
kubernetes-sigs / dra-example-driver
Example DRA driver that developers can fork and modify to get them started writing their own.
☆94Updated last month
run-ai / fake-gpu-operator
☆151Updated 2 weeks ago
NVIDIA / nvkind
☆174Updated this week
llm-d / llm-d-kv-cache-manager
Distributed KV cache coordinator
☆78Updated last week
llm-d / llm-d-model-service
Simplified model deployment on llm-d
☆27Updated 3 months ago
kubernetes-sigs / lws
LeaderWorkerSet: An API for deploying a group of pods as a unit of replication
☆601Updated this week
kubernetes-sigs / wg-serving
WG Serving
☆30Updated this week
intel / platform-aware-scheduling
Enabling Kubernetes to make pod placement decisions with platform intelligence.
☆176Updated 8 months ago
BaizeAI / kcover
🧯 Kubernetes coverage for fault awareness and recovery, works for any LLMOps, MLOps, AI workloads.
☆33Updated last week
InftyAI / llmaz
☸️ Easy, advanced inference platform for large language models on Kubernetes. 🌟 Star to support our work!
☆260Updated last week
NVIDIA / go-dcgm
Golang bindings for Nvidia Datacenter GPU Manager (DCGM)
☆134Updated last week
NVIDIA / mig-parted
MIG Partition Editor for NVIDIA GPUs
☆218Updated this week
sgl-project / ome
OME is a Kubernetes operator for enterprise-grade management and serving of Large Language Models (LLMs)
☆292Updated this week
NVIDIA / gpu-driver-container
The NVIDIA GPU driver container allows the provisioning of the NVIDIA driver through the use of containers.
☆135Updated last week
Mellanox / network-operator
NVIDIA Network Operator
☆284Updated this week
cncf-tags / container-device-interface
☆264Updated last week
leptonai / gpud
GPUd automates monitoring, diagnostics, and issue identification for GPUs
☆440Updated this week
modelpack / model-spec
Cloud Native Artifacial Intelligence Model Format Specification
☆107Updated this week
kubernetes-sigs / agent-sandbox
agent-sandbox enables easy management of isolated, stateful, singleton workloads, ideal for use cases like AI agent runtimes.
☆110Updated this week