IBM / Bridge-OperatorLinks
Bridge operator repo
☆21Updated 2 months ago
Alternatives and similar repositories for Bridge-Operator
Users that are interested in Bridge-Operator are comparing it to the libraries listed below
Sorting:
- A tool to detect infrastructure issues on cloud native AI systems☆42Updated last month
- MIG Partition Editor for NVIDIA GPUs☆204Updated last week
- llm-d benchmark scripts and tooling☆18Updated this week
- Holistic job manager on Kubernetes☆117Updated last year
- Health checks for Azure N- and H-series VMs.☆46Updated 2 weeks ago
- Cloud Native Benchmarking of Foundation Models☆38Updated last month
- ☆62Updated last week
- Home of the HPC Compatible Kubernetes Integration for IBM Spectrum LSF☆42Updated 4 years ago
- ☆253Updated 3 weeks ago
- knavigator is a development, testing, and optimization toolkit for AI/ML scheduling systems at scale on Kubernetes.☆68Updated 2 months ago
- NVIDIA Network Operator☆263Updated this week
- An Operator for deployment and maintenance of NVIDIA NIMs and NeMo microservices in a Kubernetes environment.☆117Updated this week
- A Slurm cluster for Kubernetes☆61Updated 11 months ago
- NVIDIA NCCL Tests for Distributed Training☆97Updated 3 weeks ago
- Device plugins for Volcano, e.g. GPU☆125Updated 3 months ago
- Systematic and comprehensive benchmarks for LLM systems.☆19Updated 2 weeks ago
- ☆280Updated last week
- 🧯 Kubernetes coverage for fault awareness and recovery, works for any LLMOps, MLOps, AI workloads.☆30Updated last week
- A toolkit for discovering cluster network topology.☆56Updated last week
- Go Abstraction for Allocating NVIDIA GPUs with Custom Policies☆115Updated 2 weeks ago
- Singularity implementation of k8s operator for interacting with SLURM.☆117Updated 4 years ago
- Prometheus exporter for a Infiniband Fabric☆63Updated last year
- A simulator of Kuberntes for batch and service workload.☆47Updated 4 years ago
- Bitfusion with Kubernetes Integration Support☆50Updated last year
- Kubernetes Operator for MPI-based applications (distributed training, HPC, etc.)☆487Updated 2 months ago
- GenAI inference performance benchmarking tool☆66Updated this week
- ☆124Updated last week
- OME is a Kubernetes operator for enterprise-grade management and serving of Large Language Models (LLMs)☆174Updated this week
- RDMA CNI plugin for containerized workloads☆55Updated 3 weeks ago
- GPU plugin to the node feature discovery for Kubernetes☆301Updated last year